Getting Started


This Getting Started document provides quick reference information regarding installing Analytics Zoo, running the applications, and developing your own applications using Analytics Zoo.


1. Try Analytics Zoo

Users can easily try Analytics Zoo with Docker or Google Colab without installing it. For more information:

Please also check the examples for various Analytics Zoo features (such as distributed TensorFlow and PyTorch on Spark, DL support in Spark ML pipeline, RayOnSpark, Cluster Serving, AutoML, etc.)


2. Install Analytics Zoo

Analytics Zoo installation methods are available for Python and Scala users.

2.1 Python

2.2 Scala


3. Run Analytics Zoo Applications

Analytics Zoo applications can run on remote or cloud resources, such as YARN, K8s, Databricks, or Google Dataproc.

3.1 Run on YARN

3.2 Run on K8s

3.3 Run on Databricks

3.4 Run on Google Dataproc


4. Develop Analytics Zoo Applications

Analytics Zoo provides comprehensive support for for building end-to-end, integrated data analytics and AI applications.

4.1 TensorFlow

4.2 PyTorch

Pytorch users can use Torch Model and either: - NNFrame APIs to run Spark ML Pipeline and Dataframe with PyTorch support, or - Estimator APIs to train and evaluate distributed PyTorch on Spark.

4.3 BigDL

BigDL users can use either:

4.4 Cluster Serving

Analytics Zoo Cluster Serving is a real-time distributed serving solution for deep learning (including TF, PyTorch, Caffe, BigDL and OpenVINO). Follow the Cluster Serving Programming Guide to run the Cluster Serving; the Cluster Serving API Guide explains the APIs in more detail.

4.5 AutoML

Analytics Zoo provides scalable AutoML support for time series prediction (including automatic feature generation, model selection and hyper-parameter tuning). Check the AutoML Overview for a high level description of the AutoML framework. Please check out the details in the Programming Guide and API Guide.