Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Retrieval using only one feature of User

See original GitHub issue

I want to build deep recommender system to predict movie for a given user. I have dataset which contain information about user like its id, gender, city etc (dataset contain different rows for same user with different location city) and movie information like its title, genre etc. I can train the model using this dataset by having user information in query tower and movie information in candidate tower. But during retrieval, I only have information about user’s id(this user id is also in the dataset). How to give only user id embedding in BruteForce layer for predicting movie? Like in deep recommender model, we would write brute_force = tfrs.layers.factorized_top_k.BruteForce(model.query_model.embedding_model.user_embedding) but how to take this user’s past locations and its gender into context while retrieving as we are can’t pass whole query model?

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

1reaction

patrickorlandocommented, Jun 19, 2022

@dexter1729

Yes, it’s the trained model. You are calculating the query vector based on a different dataset, containing only one row per user.
To pass multiple locations to the model you would need to modify it to accept an array of user locations and take the average vector. Since you have no such examples during training the results could be garbage. Alternatively you would take the most recent location for that user. In either case you will not be capturing the true user location at inference time and are therefore introducing training-serving skew. You can still give this a try depending on your use-case and requirements, but your results may vary. Again, the correct way to handle this is to serve these features to the model at inference time.

1reaction

patrickorlandocommented, Jun 17, 2022

Here’s the general idea @dexter1729, I haven’t tested this so you may encounter some errors. As I mentioned, this removes the ability to pass different values to the query model at inference time. A user_id will be the only input.

# users_ds is tf.data.Dataset with each record containing a user_id and the other features. Each user only appears once.
user_id_batches = []
user_vec_batches = []
for batch in user_ds:
    user_id_batches.appen(batch['user_id'])
    user_vec_batches.append(model.query_model(batch)) 

serving_user_vecs = tf.concat(user_vec_batches, axis=0)

num_users, vector_dim = tf.shape(serving_user_vecs)

serving_user_embedding_layer = tf.keras.layers.Embedding(num_users, vector_dim, mask_zero=False)

serving_user_embedding_layer.set_weights([serving_user_vecs])

serving_user_lookup = tf.keras.layers.experimental.preprocessing.StringLookup(
    vocabulary=tf.concat(user_id_batches, axis=0), mask_token=None, num_oov_indices=0
)

brute_force = tfrs.layers.factorized_top_k.BruteForce(tf.keras.Model([
    serving_user_lookup,
    serving_user_embedding_layer
])