Recommendation


Analytics Zoo provides three Recommenders, including Wide and Deep (WND) model, Neural network-based Collaborative Filtering (NCF) model and Session Recommender model. Easy-to-use Keras-Style defined models which provides compile and fit methods for training. Alternatively, they could be fed into NNFrames or BigDL Optimizer.

WND and NCF recommenders can handle either explict or implicit feedback, given corresponding features.

We also provide three user-friendly APIs to predict user item pairs, and recommend items (users) for users (items). See here for more details.


Wide and Deep

Wide and Deep Learning Model, proposed by Google, 2016, is a DNN-Linear mixed model, which combines the strength of memorization and generalization. It's useful for generic large-scale regression and classification problems with sparse input features (e.g., categorical features with a large number of possible feature values). It has been used for Google App Store for their app recommendation.

After training the model, users can use the model to do prediction and recommendation.

Scala

val wideAndDeep = WideAndDeep(modelType = "wide_n_deep", numClasses, columnInfo, hiddenLayers = Array(40, 20, 10))

See here for the Scala example that trains the WideAndDeep model on MovieLens 1M dataset and uses the model to do prediction and recommendation.

Python

wide_and_deep = WideAndDeep(class_num, column_info, model_type="wide_n_deep", hidden_layers=(40, 20, 10))

See here for the Python notebook that trains the WideAndDeep model on MovieLens 1M dataset and uses the model to do prediction and recommendation.


Neural network-based Collaborative Filtering

NCF (He, 2015) leverages a multi-layer perceptrons to learn the user–item interaction function. At the mean time, NCF can express and generalize matrix factorization under its framework. includeMF(Boolean) is provided for users to build a NeuralCF model with or without matrix factorization.

After training the model, users can use the model to do prediction and recommendation.

Scala

val ncf = NeuralCF(userCount, itemCount, numClasses, userEmbed = 20, itemEmbed = 20, hiddenLayers = Array(40, 20, 10), includeMF = true, mfEmbed = 20)

See here for the Scala example that trains the NeuralCF model on MovieLens 1M dataset and uses the model to do prediction and recommendation.

Python

ncf = NeuralCF(user_count, item_count, class_num, user_embed=20, item_embed=20, hidden_layers=(40, 20, 10), include_mf=True, mf_embed=20)

See here for the Python notebook that trains the NeuralCF model on MovieLens 1M dataset and uses the model to do prediction and recommendation.


Session Recommender

Session Recommender (Hidasi, 2015) uses an RNN-based approach for session-based recommendations. The model is enhanced in NetEase (Wu, 2016) by adding multiple layers to model users' purchase history. In Analytics Zoo, includeHistory(Boolean) is provided for users to build a SessionRecommender model with or without history.

After training the model, users can use the model to do prediction and recommendation.

Scala

val sessionRecommender = SessionRecommender(itemCount, itemEmbed, sessionLength, includeHistory, mlpHiddenLayers, historyLength)

See here for the Scala example that trains the SessionRecommender model on an ecommerce dataset provided by OfficeDepot and uses the model to do prediction and recommendation.

Python

session_recommender=SessionRecommender(item_count, item_embed, rnn_hidden_layers=[40, 20], session_length=10, include_history=True, mlp_hidden_layers=[40, 20], history_length=5)

Prediction and Recommendation

Predict for user-item pairs

Give prediction for each pair of user and item. Return RDD of UserItemPrediction.

Scala

predictUserItemPair(featureRdd)

Python

predict_user_item_pair(feature_rdd)

Parameters:

Recommend for users

Recommend a number of items for each user. Return RDD of UserItemPrediction. Only works for WND and NCF.

Scala

recommendForUser(featureRdd, maxItems)

Python

recommend_for_user(feature_rdd, max_items)

Parameters:

Recommend for items

Recommend a number of users for each item. Return RDD of UserItemPrediction. Only works for WND and NCF.

Scala

recommendForItem(featureRdd, maxUsers)

Python

recommend_for_item(feature_rdd, max_users)

Parameters:

Recommend for sessions

Recommend a number of items for each sequence. Return corresponding recommendations, each of which contains a sequence of(item, probability). Only works for Session Recommender

Scala

recommendForSession(sessions, maxItems, zeroBasedLabel)

Python

recommend_for_session(sessions, max_items, zero_based_label)

Parameters:


Model Save

After building and training a WideAndDeep or NeuralCF model, you can save it for future use.

Scala

wideAndDeep.saveModel(path, weightPath = null, overWrite = false)

ncf.saveModel(path, weightPath = null, overWrite = false)

sessionRecommender.saveModel(path, weightPath = null, overWrite = false)

Python

wide_and_deep.save_model(path, weight_path=None, over_write=False)

ncf.save_model(path, weight_path=None, over_write=False)

session_recommender.save_model(path, weight_path=None, over_write=False)

Model Load

To load a WideAndDeep or NeuralCF model (with weights) saved above:

Scala

WideAndDeep.loadModel[Float](path, weightPath = null)

NeuralCF.loadModel[Float](path, weightPath = null)

SessionRecommender.loadModel[Float](path, weightPath = null)

Python

WideAndDeep.load_model(path, weight_path=None)

NeuralCF.load_model(path, weight_path=None)

SessionRecommender.load_model(path, weight_path=None)

UserItemFeature

Represent records of user-item with features.

Each record should contain the following fields:

Scala

UserItemFeature(userId, itemId, sample)

Python

UserItemFeature(user_id, item_id, sample)

UserItemPrediction

Represent the prediction results of user-item pairs.

Each prediction record will contain the following information:

Scala

UserItemPrediction(userId, itemId, prediction, probability)

Python

UserItemPrediction(user_id, item_id, prediction, probability)

ColumnFeatureInfo

An instance of ColumnFeatureInfo contains the same data information shared by the WideAndDeep model and its feature generation part.

You can choose to include the following information for feature engineering and the WideAndDeep model:

Remark:

Fields that involve Cols should be an array of String (Scala) or a list of String (Python) indicating the name of the columns in the data.

Fields that involve Dims should be an array of integers (Scala) or a list of integers (Python) indicating the dimensions of the corresponding columns.

If any field is not specified, it will by default to be an empty array (Scala) or an empty list (Python).

Scala

ColumnFeatureInfo(
    wideBaseCols = Array[String](),
    wideBaseDims = Array[Int](),
    wideCrossCols = Array[String](),
    wideCrossDims = Array[Int](),
    indicatorCols = Array[String](),
    indicatorDims = Array[Int](),
    embedCols = Array[String](),
    embedInDims = Array[Int](),
    embedOutDims = Array[Int](),
    continuousCols = Array[String](),
    label = "label")

Python

ColumnFeatureInfo(
    wide_base_cols=None,
    wide_base_dims=None,
    wide_cross_cols=None,
    wide_cross_dims=None,
    indicator_cols=None,
    indicator_dims=None,
    embed_cols=None,
    embed_in_dims=None,
    embed_out_dims=None,
    continuous_cols=None,
    label="label")