How to use Candidate Sampling Probabilities for bias correction?
See original GitHub issueContext: I have used a candidate model to successfully create embeddings for users and products that are representative of their true sizes. My positive interactions are user:sku pairs that fit a user (sku=fashion item of a given size).
Problem: Although my size prediction task is performing quite well using the cosine similarity between the created user:sku embeddings (70% acc) I noticed that my mispredicted results are all biased in a single direction (e.g. predict smaller size than it should). From researching the issue I believe the problem may have something to do with bias introduced by the negative sampling strategy used (e.g. in-batch uniform). My thinking is that in-batch uniform sampling will lead to the more common user sizes (e.g. 2,4,6) being sampled more often than the less common ones (e.g. 8, 10, 12) and hence cause a bias/skew in the resulting embeddings.
Question: I noticed in the retrieval task we have the capability to pass in a candidate_sampling_probability that is used to correct such an issue. I was wondering if there is any guidance on how best to:
- Calculate the the candidate sampling probabilities? (e.g. for each candidate do we just calculate the probability of picking it in a random batch as (num_times_canidate_appears_in_interactions / num_all_interactions))? Or is there something more complicated needed?
- I was wondering if there is any guidance on how best to pass in the sampling probabilities to the model. At the moment I’m (a) passing in an array of candidate sampling probabilities with each interaction (e.g. as part of my features). (b) I then have to do some work to convert the samping array into the correct shape before passing it into my task each time.
feature_1 = {'user_id: 10211, 'sku': ABC2, candidate_sampling_probabilities:[0.1,0.2,0.3,0.01.....]}
feature_2 = {'user_id: 12111, 'sku': AEQ4, candidate_sampling_probabilities:[0.1,0.2,0.3,0.01.....]}.
def compute_loss(self, features, training=False):
sku_embeddings = self.sku_model(features)
user_embeddings = self.user_model(features)
candidate_sampling_array = features['candidate_sampling_prob'].numpy()
candidate_sampling_prob = np.squeeze(np.sum(candidate_sampling_array, axis=-1))
return self.task(query_embeddings=user_embeddings,
candidate_embeddings=style_embeddings,
candidate_sampling_probability=candidate_sampling_prob,
compute_metrics = not training,
)
Apologies is this issue repeats some of the comments in these issues but I thought it would be useful for others to have a solution to the above question laid out somewhere that’s easy to retrieve. https://github.com/tensorflow/recommenders/issues/232 https://github.com/tensorflow/recommenders/issues/140
Thanks, Niall
Issue Analytics
- State:
- Created 3 years ago
- Comments:18

Top Related StackOverflow Question
Hey @apdullahyayik, perhaps I can help answer your question.
There are a few things we need to clarify first.
The candidate sampling probability is based on the frequency of the item over the entire training set, not the frequency within a batch.
When training a retrieval model, there are only positive targets. If you have explicit positive and negative ratings, these are used within the ranking model. If you don’t plan to build a ranking model, you should start by filtering out your negatives and only include the positive interactions.
When you sample a batch of size N, you will have an NxN matrix of scores. The diagonal of this matrix will be the score for the positive (user, item) pair. All other columns will be used as implicit negatives. This is why the labels matrix is the Identity matrix.
Why do we need the candidate sampling proabability? Because we use in-batch negatives, more popular items will occur more frequently and therefore will be used as negatives far more often than the less popular items. The candidate probability is used to correct for this sampling bias.
For that NxN matrix of scores, we have a (1xN) vector of the sampling probabilities of the items in the batch. The log of the probability is subtracted from from every row of the scores matrix.
Example Let S be the scores matrix, P be the candidate probabilities vector, Y be the labels matrix.
No worries @nialloh23! Yep, there is significant improvement on my test set when using the bias correction. You can easily do a test it by taking the reciprocal of the probability since
log(1/x) = - log(x).Here’s a performance benchmark for MovieLens-100k. You can see how the inverse probability (
logits + math.log(q)) causes a reduction in performance over no bias correction and the proper bias correction results in a better performance.