Docker User Guide


In order to simplify the Analytics Zoo installation and configuration, Analytics Zoo docker images have been built and provided on Docker Hub. These docker images have been pre-built with all the dependencies and readily configured to run a bunch of Analytics Zoo examples out-of-box on Linux (such as Ubuntu, CentOS), MacOS or Windows. The pre-installed packages are listed at the end of this page. This document provides step-by-step instructions for users to easily start using the Analytics Zoo docker:


Launch an Analytics Zoo Docker Container

A Linux user (CentOS or Ubuntu) can pull a Docker image and launch the Docker container with one command line.

1. Getting Ready

  1. Fresh install OS such as Linux OS (CentOS, Ubuntu etc), MacOS or Windows.
  2. Download and Install Docker following the instructions on: https://docs.docker.com/get-docker/
  3. (Optional) For Docker Desktop users on MacOS and Windows, the default resources (2 CPUs and 2GB memory) is relatively small, you may want to increase them by changing docker desktop configuration. 8GB memory and 4 CPUs should be a good estimator for most examples, and the exact memory requirements vary from different applications. For more information on this, please refer to docker documentation. (here for Mac and here for windows)
  4. (optional) Pull an Analytics Zoo docker image
  5. Note: It is optional to pull an Analytics Zoo docker image in advance, as the command “docker run” in Launch Analytics Zoo Docker container step will check the availability of the docker image and pull the image if it is absent. To manually pull the latest docker image, use:
sudo docker pull intelanalytics/analytics-zoo:latest

To speed up pulling the image from dockerhub in China, add a registry's mirror. For Linux OS (CentOS, Ubuntu etc), if the docker version is higher than 1.12, config the docker daemon. Edit /etc/docker/daemon.json and add the registry-mirrors key and value:

{
  "registry-mirrors": ["https://<my-docker-mirror-host>"]
}

For example, add the ustc mirror in China.

{
  "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"]
}

Flush changes and restart docker:

sudo systemctl daemon-reload
sudo systemctl restart docker

If your docker version is between 1.8 and 1.11, find the docker configuration which location depends on the operation system. Edit and add DOCKER_OPTS="--registry-mirror=https://<my-docker-mirror-host>". Restart docker sudo service docker restart.

If you would like to speed up pulling this image on MacOS or Windows, find the docker setting and config registry-mirrors section by specifying mirror host. Restart docker.

Then pull the image. It will be faster.

sudo docker pull intelanalytics/analytics-zoo:latest

2. Launch Analytics Zoo Docker container

Launch an Analytics Zoo Docker container with this command line:

$sudo docker run -it --rm --net=host -e NOTEBOOK_PORT=12345 -e NOTEBOOK_TOKEN="your-token" intelanalytics/analytics-zoo:latest bash

If you want to specify an Analytics Zoo version, for example 0.7.0, use:

sudo docker run -it --rm --net=host \
    -e NOTEBOOK_PORT=12345 \
    -e NOTEBOOK_TOKEN="your-token" \
    intelanalytics/analytics-zoo:0.7.0-bigdl_0.10.0-spark_2.4.3 bash

If you need to use http/https proxy, use:

sudo docker run -it --rm --net=host \
    -e NOTEBOOK_PORT=12345 \
    -e NOTEBOOK_TOKEN="your-token" \
    -e http_proxy=http://your-proxy-host:your-proxy-port \
    -e https_proxy=https://your-proxy-host:your-proxy-port \
    intelanalytics/analytics-zoo:default bash

Once the container is sucessfully launched, you will automatically login to the container and see this as the output:

root@[hostname]:/opt/work#

Note: The /opt/work directory contains:


Run Analytics Zoo Jupyter Notebook example in a container

This section depends on the previous section “Launch Analytics Zoo Docker container”. After the user launches the Analytics Zoo Docker container, the Jupyter Notebook service can be started and Analytics Zoo Jupyter Notebook examples are available.

Start the Jupyter Notebook service in the container

After a Docker container is launched and user login to the container, user can start the Jupyter Notebook service inside the container.

In the /opt/work directory, run this command line to start the Jupyter Notebook service:

#start-notebook.sh

As a result, you will see the output message like below. This means the Jupyter Notebook service has started successfully within the container.

[I 01:04:45.625 NotebookApp] Serving notebooks from local directory: /opt/work/analytics-zoo-0.5.0-SNAPSHOT/apps
[I 01:04:45.625 NotebookApp] The Jupyter Notebook is running at:
[I 01:04:45.625 NotebookApp] http://(the-host-name or 127.0.0.1):12345/?token=...
[I 01:04:45.625 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Connect to Jupyter Notebook service from a browser

After the Jupyter Notebook service is successfully started, users can connect to the Jupyter Notebook service from a browser.

  1. Get the IP address of the container
  2. Launch a browser, and connect to the Jupyter Notebook service with the URL: https://container-ip-address:port-number/?token=your-token

As a result, you will see the Jupyter Notebook like this:

notebook1

Run Analytics Zoo Jupyter Notebook examples

After connecting to the Jupyter Notebook in the browser, users can run multiple Analytics Zoo Jupyter Notebook examples. The example shown below is the “dogs-vs-cats”.

notebook2

notebook3

notebook4

notebook5


Create a new Analytics Zoo Jupyter Notebook example

There are a few ways to quickly start creating a new Analytics Zoo Jupyter Notebook example. Users can “New” a Python notebook from scratch, or use the existing Jupyter Notebook example as an template and modify based on it.

“New” a Jupyter Notebook and write Analytics Zoo code from scratch

Once the user is connected to the Jupyter Notebook service and opens the address in a browser, the user can instantly click the “New” button and start creating a Python notebook example in the existing Analytics Zoo Jupyter Notebook folder.

notebook6

“Make a Copy” from an existing Analytics Zoo Jupyter Notebook example and model after it

The user can also select an existing Analytics Zoo Jupyter Notebook example approximating to his own use case, then click "Make a Copy" and start tailing it.

notebook7


Shut Down the Analytics Zoo Docker container

Users should shut down the Analytics Zoo Docker container after using it.

A user can list all the active Docker containers by command line:

$sudo docker ps

As a result, your docker container should be shown like:

CONTAINER ID        IMAGE                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
40de2cdad025        intelanalytics/analytics-zoo:latest          "/opt/work/start-n..."   3 hours ago         Up 3 hours                               upbeat_al  

Shut down the corresponding docker container by its ID:

$sudo docker rm -f 40de2cdad025        

Build a customized Analytics Zoo Docker image

A set of pre-build Analytics Zoo Docker images have been provided on the Docker Hub. Users can retrieve these Docker images by “docker pull” command and specify a tag for which Docker image to download. For example:

sudo docker pull intelanalytics/analytics-zoo:0.7.0-bigdl_0.10.0-spark_2.4.3
sudo docker pull intelanalytics/analytics-zoo:latest 

It is recommended to use a pre-build Docker image on the Docker Hub. However you can build or customize an Analytics Zoo Docker image if you do need to do so. Visit Analytics Zoo Dockerfile. Customize it and build the image. For example:

sudo docker build --rm -t intelanalytics/analytics-zoo:default . 
sudo docker build \
    --build-arg http_proxy=http://your-proxy-host:your-proxy-port \
    --build-arg https_proxy=https://your-proxy-host:your-proxy-port \
    --rm -t intelanalytics/analytics-zoo:default .
sudo docker build \
    --build-arg ANALYTICS_ZOO_VERSION=0.11.0 \
    --build-arg BIGDL_VERSION=0.13.0 \
    --build-arg SPARK_VERSION=2.4.6 \
    --rm -t intelanalytics/analytics-zoo:0.11.0-bigdl_0.13.0-spark_2.4.6 .

Pre-installed Packages

The Analytics-Zoo Docker images have been pre-built with below packages:


In addition to the general Analytics Zoo docker image introduced in this page, the Analytics Zoo hyperzoo image is also pre-built, which is added K8s and cluster serving support. To learn about the hyperzoo imgae, visit here.