Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use Candidate Sampling Probabilities for bias correction?

See original GitHub issue

Context: I have used a candidate model to successfully create embeddings for users and products that are representative of their true sizes. My positive interactions are user:sku pairs that fit a user (sku=fashion item of a given size).

Problem: Although my size prediction task is performing quite well using the cosine similarity between the created user:sku embeddings (70% acc) I noticed that my mispredicted results are all biased in a single direction (e.g. predict smaller size than it should). From researching the issue I believe the problem may have something to do with bias introduced by the negative sampling strategy used (e.g. in-batch uniform). My thinking is that in-batch uniform sampling will lead to the more common user sizes (e.g. 2,4,6) being sampled more often than the less common ones (e.g. 8, 10, 12) and hence cause a bias/skew in the resulting embeddings.

Question: I noticed in the retrieval task we have the capability to pass in a candidate_sampling_probability that is used to correct such an issue. I was wondering if there is any guidance on how best to:

Calculate the the candidate sampling probabilities? (e.g. for each candidate do we just calculate the probability of picking it in a random batch as (num_times_canidate_appears_in_interactions / num_all_interactions))? Or is there something more complicated needed?
I was wondering if there is any guidance on how best to pass in the sampling probabilities to the model. At the moment I’m (a) passing in an array of candidate sampling probabilities with each interaction (e.g. as part of my features). (b) I then have to do some work to convert the samping array into the correct shape before passing it into my task each time.

feature_1 = {'user_id: 10211, 'sku': ABC2, candidate_sampling_probabilities:[0.1,0.2,0.3,0.01.....]}
feature_2 = {'user_id: 12111, 'sku': AEQ4, candidate_sampling_probabilities:[0.1,0.2,0.3,0.01.....]}.

    def compute_loss(self, features, training=False):
        sku_embeddings = self.sku_model(features)
        user_embeddings = self.user_model(features)

        candidate_sampling_array = features['candidate_sampling_prob'].numpy()
        candidate_sampling_prob = np.squeeze(np.sum(candidate_sampling_array, axis=-1))

        return self.task(query_embeddings=user_embeddings,
                         candidate_embeddings=style_embeddings,
                         candidate_sampling_probability=candidate_sampling_prob,
                         compute_metrics = not training,
                         )

Apologies is this issue repeats some of the comments in these issues but I thought it would be useful for others to have a solution to the above question laid out somewhere that’s easy to retrieve. https://github.com/tensorflow/recommenders/issues/232 https://github.com/tensorflow/recommenders/issues/140

Thanks, Niall

Issue Analytics

State:
Created 3 years ago
Comments:18

Top GitHub Comments

5reactions

patrickorlandocommented, Oct 13, 2021

Hey @apdullahyayik, perhaps I can help answer your question.

There are a few things we need to clarify first.

The candidate sampling probability is based on the frequency of the item over the entire training set, not the frequency within a batch.
When training a retrieval model, there are only positive targets. If you have explicit positive and negative ratings, these are used within the ranking model. If you don’t plan to build a ranking model, you should start by filtering out your negatives and only include the positive interactions.
When you sample a batch of size N, you will have an NxN matrix of scores. The diagonal of this matrix will be the score for the positive (user, item) pair. All other columns will be used as implicit negatives. This is why the labels matrix is the Identity matrix.

Why do we need the candidate sampling proabability? Because we use in-batch negatives, more popular items will occur more frequently and therefore will be used as negatives far more often than the less popular items. The candidate probability is used to correct for this sampling bias.

For that NxN matrix of scores, we have a (1xN) vector of the sampling probabilities of the items in the batch. The log of the probability is subtracted from from every row of the scores matrix.

Example Let S be the scores matrix, P be the candidate probabilities vector, Y be the labels matrix.

   [[ 7.9, 5.0, 6.7 ]
S = [ 6.8, 7.3, 5.7 ]
    [ 4.1, 3.8, 8.4 ]]

P = [[0.005, 0.07, 0.1]]

Ln(P) = [[-5.3, -2.6, -2.3 ]]

S' = S - Ln(P)

   [[ 7.9 + 5.3, 5.0 + 2.6, 6.7 + 2.3 ]
  = [ 6.8 + 5.3, 7.3 + 2.6, 5.7 + 2.3 ]
    [ 4.1 + 5.3, 3.8 + 2.6, 8.4 + 2.3 ]]

   [[ 1, 0, 0 ]
Y = [ 0, 1, 0 ]
    [ 0, 0, 1 ]]

2reactions

patrickorlandocommented, Apr 7, 2021

No worries @nialloh23! Yep, there is significant improvement on my test set when using the bias correction. You can easily do a test it by taking the reciprocal of the probability since log(1/x) = - log(x).

Here’s a performance benchmark for MovieLens-100k. You can see how the inverse probability (logits + math.log(q)) causes a reduction in performance over no bias correction and the proper bias correction results in a better performance.

experiement	top_1_categorical_accuracy	top_5_categorical_accuracy	top_10_categorical_accuracy	top_50_categorical_accuracy	top_100_categorical_accuracy	loss
no_bias_correction	0.0024	0.0161	0.0342	0.1607	0.2705	13307.1
sampling_probability	0.0045	0.0261	0.0495	0.1956	0.3213	13865
inverse_sampling_probability	0.0009	0.0039	0.0089	0.0476	0.0861	13740.8

Top Results From Across the Web

Sample Selection Bias Correction Theory - Google Research

Abstract. This paper presents a theoretical analysis of sample selection bias cor- rection. The sample bias correction technique commonly used in machine ...

Sampling Bias Corrected Neural Modeling for Large Corpus ...

Sampling Bias Corrected Neural Modeling for Large Corpus Item Recommendations ... a common choice for the probability distribution of picking candidate.

Sampling Bias Correction for Supervised Machine Learning

We then apply this solution to binary logistic regression, and discuss scenarios where a dataset might be subject to intentional sample bias ......

How do you correct selection bias? | by adam kelleher - Medium

In the rest of this post, I'll go more into the technical background so you understand selection bias. Then I'll briefly explain the...

Bias correction for direct spectral estimation from irregularly ...

This is given only in those cases, where the occurrence of one sample at a certain time has no influence on other samples...