Install


For Python users, Analytics Zoo can be installed either from pip or without pip.

NOTE: Only Python 2.7, Python 3.5 and Python 3.6 are supported for now.


Install from pip

Analytics Zoo can be installed via pip easily using the following command.

Install analytics-zoo-0.2.0

pip install --upgrade pip
pip install analytics-zoo==0.2.0     # for Python 2.7
pip3 install analytics-zoo==0.2.0    # for Python 3.5 and Python 3.6

Important:

  1. Installing analytics-zoo from pip will automatically install pyspark. To avoid possible conflicts, you are highly recommended to unset SPARK_HOME if it exists in your environment.

  2. Please always first call init_nncontext() at the very beginning of your code after pip install. This will create a SparkContext with optimized performance configuration and initialize the BigDL engine.

from zoo.common.nncontext import *
sc = init_nncontext()

Remarks:

  1. We've tested this package with pip 9.0.1.
  2. Pip install supports Mac and Linux platforms.
  3. Pip install only supports local mode. Cluster mode might be supported in the future. For those who want to use Analytics Zoo in cluster mode, please try to install without pip.
  4. You need to install Java >= JDK8 before running Analytics Zoo, which is required by pyspark.
  5. pyspark(2.2), bigdl==0.6.0 and its dependencies will be automatically installed if they haven't been detected in the current Python environment.

Install without pip

If you choose to install Analytics Zoo without pip, you need to prepare Spark and install necessary Python dependencies.

Steps:

  1. Download Spark

    • Note that Python 3.6 is only compatible with Spark 1.6.4, 2.0.3, 2.1.1 and >=2.2.0. See this issue for more discussion.
  2. You can download the Analytics Zoo release and nightly build from the Release Page or build the Analytics Zoo package from source.

  3. Install Python dependencies. Analytics Zoo only depends on numpy and six for now.

For Spark standalone cluster

For Yarn cluster

You can run Analytics Zoo Python programs on Yarn clusters without changes to the cluster (i.e., no need to pre-install any Python dependency).

You can first package all the required dependencies into a virtual environment on the local node (where you will run the spark-submit command), and then directly use spark-submit to run the Analytics Zoo Python program on the Yarn cluster using that virtual environment.

Follow the steps below to create the virtual environment:

apt-get update
apt-get install -y python-setuptools python-dev
apt-get install -y gcc make
apt-get install -y zip
easy_install pip

FAQ

In case you encounter the following errors when you create the environment package using the above command:

  1. virtualenv ImportError: No module named urllib3
    • Using python in anaconda to create virtualenv may cause this problem. Try using python default in your system instead of installing virtualenv in anaconda.
  2. AttributeError: 'module' object has no attribute 'sslwrap'
    • Try upgrading gevent with pip install --upgrade gevent.