Recommendation


Analytics Zoo provides two Recommenders, including Wide and Deep (WND) model and Neural network-based Collaborative Filtering (NCF) model. Each model could be fed into NNFrames or BigDL Optimizer directly for training.

Recommenders can handle models with either explict or implicit feedback, given corresponding features.

We also provide three user-friendly APIs to predict user item pairs, and recommend items (users) for users (items). See here for more details.


Wide and Deep

Wide and Deep Learning Model, proposed by Google, 2016, is a DNN-Linear mixed model, which combines the strength of memorization and generalization. It's useful for generic large-scale regression and classification problems with sparse input features (e.g., categorical features with a large number of possible feature values). It has been used for Google App Store for their app recommendation.

After training the model, users can use the model to do prediction and recommendation.

Scala

val wideAndDeep = WideAndDeep(modelType = "wide_n_deep", numClasses, columnInfo, hiddenLayers = Array(40, 20, 10))

See here for the Scala example that trains the WideAndDeep model on MovieLens 1M dataset and uses the model to do prediction and recommendation.

Python

wide_and_deep = WideAndDeep(class_num, column_info, model_type="wide_n_deep", hidden_layers=(40, 20, 10))

See here for the Python notebook that trains the WideAndDeep model on MovieLens 1M dataset and uses the model to do prediction and recommendation.


Neural network-based Collaborative Filtering

NCF (He, 2015) leverages a multi-layer perceptrons to learn the user–item interaction function. At the mean time, NCF can express and generalize matrix factorization under its framework. includeMF(Boolean) is provided for users to build a NeuralCF model with or without matrix factorization.

After training the model, users can use the model to do prediction and recommendation.

Scala

val ncf = NeuralCF(userCount, itemCount, numClasses, userEmbed = 20, itemEmbed = 20, hiddenLayers = Array(40, 20, 10), includeMF = true, mfEmbed = 20)

See here for the Scala example that trains the NeuralCF model on MovieLens 1M dataset and uses the model to do prediction and recommendation.

Python

ncf = NeuralCF(user_count, item_count, class_num, user_embed=20, item_embed=20, hidden_layers=(40, 20, 10), include_mf=True, mf_embed=20)

See here for the Python notebook that trains the NeuralCF model on MovieLens 1M dataset and uses the model to do prediction and recommendation.


Prediction and Recommendation

Predict for user-item pairs

Give prediction for each pair of user and item. Return RDD of UserItemPrediction.

Scala

predictUserItemPair(featureRdd)

Python

predict_user_item_pair(feature_rdd)

Parameters:

Recommend for users

Recommend a number of items for each user. Return RDD of UserItemPrediction.

Scala

recommendForUser(featureRdd, maxItems)

Python

recommend_for_user(feature_rdd, max_items)

Parameters:

Recommend for items

Recommend a number of users for each item. Return RDD of UserItemPrediction.

Scala

recommendForItem(featureRdd, maxUsers)

Python

recommend_for_item(feature_rdd, max_users)

Parameters:


Model Save

After building and training a WideAndDeep or NeuralCF model, you can save it for future use.

Scala

wideAndDeep.saveModel(path, weightPath = null, overWrite = false)

ncf.saveModel(path, weightPath = null, overWrite = false)

Python

wide_and_deep.save_model(path, weight_path=None, over_write=False)

ncf.save_model(path, weight_path=None, over_write=False)

Model Load

To load a WideAndDeep or NeuralCF model (with weights) saved above:

Scala

WideAndDeep.loadModel[Float](path, weightPath = null)

NeuralCF.loadModel[Float](path, weightPath = null)

Python

WideAndDeep.load_model(path, weight_path=None)

NeuralCF.load_model(path, weight_path=None)

UserItemFeature

Represent records of user-item with features.

Each record should contain the following fields:

Scala

UserItemFeature(userId, itemId, sample)

Python

UserItemFeature(user_id, item_id, sample)

UserItemPrediction

Represent the prediction results of user-item pairs.

Each prediction record will contain the following information:

Scala

UserItemPrediction(userId, itemId, prediction, probability)

Python

UserItemPrediction(user_id, item_id, prediction, probability)

ColumnFeatureInfo

An instance of ColumnFeatureInfo contains the same data information shared by the WideAndDeep model and its feature generation part.

You can choose to include the following information for feature engineering and the WideAndDeep model:

Remark:

Fields that involve Cols should be an array of String (Scala) or a list of String (Python) indicating the name of the columns in the data.

Fields that involve Dims should be an array of integers (Scala) or a list of integers (Python) indicating the dimensions of the corresponding columns.

If any field is not specified, it will by default to be an empty array (Scala) or an empty list (Python).

Scala

ColumnFeatureInfo(
    wideBaseCols = Array[String](),
    wideBaseDims = Array[Int](),
    wideCrossCols = Array[String](),
    wideCrossDims = Array[Int](),
    indicatorCols = Array[String](),
    indicatorDims = Array[Int](),
    embedCols = Array[String](),
    embedInDims = Array[Int](),
    embedOutDims = Array[Int](),
    continuousCols = Array[String](),
    label = "label")

Python

ColumnFeatureInfo(
    wide_base_cols=None,
    wide_base_dims=None,
    wide_cross_cols=None,
    wide_cross_dims=None,
    indicator_cols=None,
    indicator_dims=None,
    embed_cols=None,
    embed_in_dims=None,
    embed_out_dims=None,
    continuous_cols=None,
    label="label")