TensorFlow Quickstart
In this guide we will describe how to scale out TensorFlow (v1.15) programs using Orca in 4 simple steps.
Run in Google Colab View source on GitHub
Step 0: Prepare Environment
We recommend using Conda to prepare the environment. Please refer to the install guide for more details.
Note: Conda environment is required to run on the distributed cluster, but not strictly necessary for running on the local machine.
conda create -n zoo python=3.7 # "zoo" is conda environment name, you can use any name you like.
conda activate zoo
pip install analytics-zoo # install either version 0.9 or latest nightly build
pip install tensorflow==1.15.0
pip install tensorflow-datasets==2.0
pip install psutil
Step 1: Init Orca Context
if args.cluster_mode == "local":
init_orca_context(cluster_mode="local", cores=4)# run in local mode
elif args.cluster_mode == "k8s":
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2) # run on K8s cluster
elif args.cluster_mode == "yarn":
init_orca_context(cluster_mode="yarn-client", num_nodes=2, cores=2) # run on Hadoop YARN cluster
This is the only place where you need to specify local or distributed mode. View Orca Context for more details.
Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir
when you run on Hadoop YARN cluster.
Step 2: Define the Model
You may define your model, loss and metrics in the same way as in any standard (single node) TensorFlow program.
import tensorflow as tf
def accuracy(logits, labels):
predictions = tf.argmax(logits, axis=1, output_type=labels.dtype)
is_correct = tf.cast(tf.equal(predictions, labels), dtype=tf.float32)
return tf.reduce_mean(is_correct)
def lenet(images):
with tf.variable_scope('LeNet', [images]):
net = tf.layers.conv2d(images, 32, (5, 5), activation=tf.nn.relu, name='conv1')
net = tf.layers.max_pooling2d(net, (2, 2), 2, name='pool1')
net = tf.layers.conv2d(net, 64, (5, 5), activation=tf.nn.relu, name='conv2')
net = tf.layers.max_pooling2d(net, (2, 2), 2, name='pool2')
net = tf.layers.flatten(net)
net = tf.layers.dense(net, 1024, activation=tf.nn.relu, name='fc3')
logits = tf.layers.dense(net, 10)
return logits
# tensorflow inputs
images = tf.placeholder(dtype=tf.float32, shape=(None, 28, 28, 1))
# tensorflow labels
labels = tf.placeholder(dtype=tf.int32, shape=(None,))
logits = lenet(images)
loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(logits=logits, labels=labels))
acc = accuracy(logits, labels)
Step 3: Define Train Dataset
You can define the dataset using standard tf.data.Dataset. Orca also supports Spark DataFrame and Orca XShards.
import tensorflow_datasets as tfds
def preprocess(data):
data['image'] = tf.cast(data["image"], tf.float32) / 255.
return data['image'], data['label']
# get DataSet
mnist_train = tfds.load(name="mnist", split="train", data_dir=dataset_dir)
mnist_test = tfds.load(name="mnist", split="test", data_dir=dataset_dir)
mnist_train = mnist_train.map(preprocess)
mnist_test = mnist_test.map(preprocess)
Step 4: Fit with Orca Estimator
First, create an Estimator.
from zoo.orca.learn.tf.estimator import Estimator
est = Estimator.from_graph(inputs=images,
outputs=logits,
labels=labels,
loss=loss,
optimizer=tf.train.AdamOptimizer(),
metrics={"acc": acc})
Next, fit and evaluate using the Estimator.
est.fit(data=train_dataset,
batch_size=320,
epochs=5,
validation_data=mnist_test)
result = est.evaluate(mnist_test)
print(result)
That's it, the same code can run seamlessly in your local laptop and the distribute K8s or Hadoop cluster.
Note: You should call stop_orca_context()
when your program finishes.