KerasModel
KerasModel
KerasModel enables user to use tf.keras
API to define TensorFlow models and perform training or evaluation on top
of Spark and BigDL in a distributed fashion.
Remarks:
 You need to install tensorflow==1.15.0 on your driver node.
 Your operating system (OS) is required to be one of the following 64bit systems: Ubuntu 16.04 or later and macOS 10.12.6 or later.
 To run on other systems, you need to manually compile the TensorFlow source code. Instructions can be found here.
from zoo.tfpark import KerasModel, TFDataset
import tensorflow as tf
model = tf.keras.Sequential(
[tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax'),
]
)
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
keras_model = KerasModel(model)
Methods
__init__
KerasModel(model)
Arguments
 model: a compiled keras model defined using
tf.keras
fit
fit(x=None, y = None, batch_size=None, epochs=1, validation_data=None, distributed=False)
Arguments

x: Input data. It could be:
 a TFDataset object  A Numpy array (or arraylike), or a list of arrays (in case the model has multiple inputs).  A dict mapping input names to the corresponding array/tensors, if the model has named inputs.

y: Target data. Like the input data
x
, It should be consistent withx
(you cannot have Numpy inputs and tensor targets, or inversely). Ifx
is a TFDataset,y
should not be specified (since targets will be obtained fromx
). 
batch_size: Integer or
None
. Number of samples per gradient update. Ifx
is a TFDataset, you do not need to specify batch_size. 
epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire
x
andy
data provided. 
validation_data: validation_data: Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data.
validation_data
could be:  tuple(x_val, y_val)
of Numpy arrays or tensors 
distributed: Boolean. Whether to do prediction in distributed mode or local mode. Default is True. In local mode, x must be a Numpy array.
evaluate
evaluate(x=None, y=None, bath_per_thread=None, distributed=False)
Arguments
 x: Input data. It could be:
 a TFDataset object  A Numpy array (or arraylike), or a list of arrays (in case the model has multiple inputs).  A dict mapping input names to the corresponding array/tensors, if the model has named inputs.
 y: Target data. Like the input data
x
, It should be consistent withx
(you cannot have Numpy inputs and tensor targets, or inversely). Ifx
is a TFDataset,y
should not be specified (since targets will be obtained fromx
).  batch_per_thread: The default value is 1. When distributed is True,the total batch size is batch_per_thread * rdd.getNumPartitions. When distributed is False the total batch size is batch_per_thread * numOfCores.
 distributed: Boolean. Whether to do prediction in distributed mode or local mode. Default is True. In local mode, x must be a Numpy array.
 y: Target data. Like the input data
predict
predict(x, batch_per_thread=None, distributed=False):
Arguments
 x: Input data. It could be:
 a TFDataset object  A Numpy array (or arraylike), or a list of arrays (in case the model has multiple inputs).  A dict mapping input names to the corresponding array/tensors,
 batch_per_thread: The default value is 1. When distributed is True,the total batch size is batch_per_thread * rdd.getNumPartitions. When distributed is False the total batch size is batch_per_thread * numOfCores.
 distributed: Boolean. Whether to do prediction in distributed mode or local mode. Default is True. In local mode, x must be a Numpy array.