KerasModel
KerasModel
KerasModel enables user to use tf.keras API to define TensorFlow models and perform training or evaluation on top
of Spark and BigDL in a distributed fashion.
Remarks:
- You need to install tensorflow==1.15.0 on your driver node.
- Your operating system (OS) is required to be one of the following 64-bit systems: Ubuntu 16.04 or later and macOS 10.12.6 or later.
- To run on other systems, you need to manually compile the TensorFlow source code. Instructions can be found here.
from zoo.tfpark import KerasModel, TFDataset
import tensorflow as tf
model = tf.keras.Sequential(
[tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax'),
]
)
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
keras_model = KerasModel(model)
Methods
__init__
KerasModel(model)
Arguments
- model: a compiled keras model defined using
tf.keras
fit
fit(x=None, y = None, batch_size=None, epochs=1, validation_data=None, distributed=False)
Arguments
-
x: Input data. It could be:
- a TFDataset object - A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors, if the model has named inputs. -
y: Target data. Like the input data
x, It should be consistent withx(you cannot have Numpy inputs and tensor targets, or inversely). Ifxis a TFDataset,yshould not be specified (since targets will be obtained fromx). -
batch_size: Integer or
None. Number of samples per gradient update. Ifxis a TFDataset, you do not need to specify batch_size. -
epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire
xandydata provided. -
validation_data: validation_data: Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data.
validation_datacould be: - tuple(x_val, y_val)of Numpy arrays or tensors -
distributed: Boolean. Whether to do prediction in distributed mode or local mode. Default is True. In local mode, x must be a Numpy array.
evaluate
evaluate(x=None, y=None, bath_per_thread=None, distributed=False)
Arguments
- x: Input data. It could be:
- a TFDataset object - A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors, if the model has named inputs.- y: Target data. Like the input data
x, It should be consistent withx(you cannot have Numpy inputs and tensor targets, or inversely). Ifxis a TFDataset,yshould not be specified (since targets will be obtained fromx). - batch_per_thread: The default value is 1. When distributed is True,the total batch size is batch_per_thread * rdd.getNumPartitions. When distributed is False the total batch size is batch_per_thread * numOfCores.
- distributed: Boolean. Whether to do prediction in distributed mode or local mode. Default is True. In local mode, x must be a Numpy array.
- y: Target data. Like the input data
predict
predict(x, batch_per_thread=None, distributed=False):
Arguments
- x: Input data. It could be:
- a TFDataset object - A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors,- batch_per_thread: The default value is 1. When distributed is True,the total batch size is batch_per_thread * rdd.getNumPartitions. When distributed is False the total batch size is batch_per_thread * numOfCores.
- distributed: Boolean. Whether to do prediction in distributed mode or local mode. Default is True. In local mode, x must be a Numpy array.