NOTE: Only Python 2.7, Python 3.5 and Python 3.6 are supported for now.
Install from pip
Analytics Zoo can be installed via pip easily using the following command.
- Note that you might need to add
sudoif you don't have the permission for installation.
pip install --upgrade pip pip install analytics-zoo==0.2.0 # for Python 2.7 pip3 install analytics-zoo==0.2.0 # for Python 3.5 and Python 3.6
Installing analytics-zoo from pip will automatically install
pyspark. To avoid possible conflicts, you are highly recommended to unset
SPARK_HOMEif it exists in your environment.
Please always first call
init_nncontext()at the very beginning of your code after pip install. This will create a SparkContext with optimized performance configuration and initialize the BigDL engine.
from zoo.common.nncontext import * sc = init_nncontext()
- We've tested this package with pip 9.0.1.
- Pip install supports Mac and Linux platforms.
- Pip install only supports local mode. Cluster mode might be supported in the future. For those who want to use Analytics Zoo in cluster mode, please try to install without pip.
- You need to install Java >= JDK8 before running Analytics Zoo, which is required by
bigdl==0.6.0and its dependencies will be automatically installed if they haven't been detected in the current Python environment.
Install without pip
If you choose to install Analytics Zoo without pip, you need to prepare Spark and install necessary Python dependencies.
- Note that Python 3.6 is only compatible with Spark 1.6.4, 2.0.3, 2.1.1 and >=2.2.0. See this issue for more discussion.
Install Python dependencies. Analytics Zoo only depends on
For Spark standalone cluster
- Remark: If you're running in cluster mode, you need to install Python dependencies on both client and each worker node.
- Install numpy:
sudo apt-get install python-numpy(Ubuntu)
- Install six:
sudo apt-get install python-six(Ubuntu)
For Yarn cluster
You can run Analytics Zoo Python programs on Yarn clusters without changes to the cluster (i.e., no need to pre-install any Python dependency).
You can first package all the required dependencies into a virtual environment on the local node (where you will run the spark-submit command), and then directly use spark-submit to run the Analytics Zoo Python program on the Yarn cluster using that virtual environment.
Follow the steps below to create the virtual environment:
- Make sure you already installed such libraries (python-setuptools, python-dev, gcc, make, zip, pip) for creating the virtual environment. If not, please install them first. On Ubuntu, you can run these commands to install:
apt-get update apt-get install -y python-setuptools python-dev apt-get install -y gcc make apt-get install -y zip easy_install pip
Create the virtualenv package for dependencies.
Under $ANALYTICS_ZOO_HOME (the dist directory under the Analytics Zoo project), you can find
bin/python_package.sh. Run this script to create the dependency virtual environment according to the dependencies listed in
requirements.txt. You can add your own dependencies into this file if you wish. The current requirements only contain those needed for running Analytics Zoo Python examples and models.
After running this script, there will be
venvdirectory generated in current directory. You can use them to submit your Python jobs. Please refer to here for the commands to submit an Analytics Zoo Python job with the created virtual environment in Yarn cluster.
In case you encounter the following errors when you create the environment package using the above command:
- virtualenv ImportError: No module named urllib3
- Using python in anaconda to create virtualenv may cause this problem. Try using python default in your system instead of installing virtualenv in anaconda.
- AttributeError: 'module' object has no attribute 'sslwrap'
- Try upgrading
pip install --upgrade gevent.
- Try upgrading