Getting Started

This Getting Started document provides quick reference information regarding installing Analytics Zoo, running the applications, and developing your own applications using Analytics Zoo.

1. Try Analytics Zoo

Users can easily try Analytics Zoo with Docker or Google Colab without installing it. For more information:

Check the Docker User Guide
Check the Google Colab Guide page

Please also check the examples for various Analytics Zoo features (such as distributed TensorFlow and PyTorch on Spark, DL support in Spark ML pipeline, RayOnSpark, Cluster Serving, AutoML, etc.)

2. Install Analytics Zoo

Analytics Zoo installation methods are available for Python and Scala users.

2.1 Python

Check the Python User Guide for how to install Analytics Zoo in Python program environment.

2.2 Scala

Check the Scala User Guide for how to install Analytics Zoo in Scala program environment.

3. Run Analytics Zoo Applications

Analytics Zoo applications can run on remote or cloud resources, such as YARN, K8s, Databricks, or Google Dataproc.

3.1 Run on YARN

Python users can follow the instructions in Python User Guide to run Analytics Zoo applications on YARN.
Scala users can follow the instructions in Scala User Guide to run Analytics Zoo applications on YARN.

3.2 Run on K8s

Check the instructions for how to run Analytics Zoo applicaitons on K8s.

3.3 Run on Databricks

Check the instructions for how to run Analytics Zoo applicaitons on Databricks.

3.4 Run on Google Dataproc

Check the instructions for how to run Analytics Zoo applications on Google Dataproc environment.

4. Develop Analytics Zoo Applications

Analytics Zoo provides comprehensive support for for building end-to-end, integrated data analytics and AI applications.

4.1 TensorFlow

TensorFlow users can leverage TFPark APIs for running distributed TensorFlow on Spark.

4.2 PyTorch

Pytorch users can use Torch Model and either: - NNFrame APIs to run Spark ML Pipeline and Dataframe with PyTorch support, or - Estimator APIs to train and evaluate distributed PyTorch on Spark.

4.3 BigDL

BigDL users can use either:

NNFrame APIs to run Spark ML Pipeline and Dataframe with BigDL support, or
Keras-style APIs for BigDL to build deep learning pipeline.

4.4 Cluster Serving

Analytics Zoo Cluster Serving is a real-time distributed serving solution for deep learning (including TF, PyTorch, Caffe, BigDL and OpenVINO). Follow the Cluster Serving Programming Guide to run the Cluster Serving; the Cluster Serving API Guide explains the APIs in more detail.

4.5 AutoML

Analytics Zoo provides scalable AutoML support for time series prediction (including automatic feature generation, model selection and hyper-parameter tuning). Check the AutoML Overview for a high level description of the AutoML framework. Please check out the details in the Programming Guide and API Guide.