question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question]: How to handle negative samples?

See original GitHub issue

Hi @maciejkula thanks again for a great library!

I have another question which is a little theoretical, I would like to understand how to handle negative examples explicitly in this library. So as a dataset we provide positive rows (person x product_purchased) and the library handles selecting negative examples when training.

However in some domains it is important to provide explicit negative samples, for example in advertising we have (ad x web_page) and most of the time a cross happens it will create no positive interaction say a click. So to measure how well a specific ad performs on a specific page you need to know how many times it was shown, so you end up with:

ad.        | page                  | clicks       | impressions
shoes.  | shopping.com.  | 10.           | 1000

From which we can calculate a click rate (CTR).

So my question is how best to handle such a dataset, where there is not just a positive interaction, but there is also data about how many times that positive interaction would have had a chance to form.

My basic idea:

  • decide a threshold for a “good” CTR, and convert to binary labels
  • add a weight per sample based on the number of impressions, more impressions would be a higher weight
  • drop all negative examples from the data set

But I am sure there is a better way! Any help would be appreciated!

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

2reactions
patrickorlandocommented, Apr 28, 2022

Hey @ydennisy, As far as I know, there is no way to handle explicit negative samples in the retrieval stage. My approach would be to train the retrieval model on only positive examples, and then train a separate ranking model containing all examples.

For your second question, you’ll probably find https://github.com/tensorflow/recommenders/issues/334#issuecomment-894873355 and the following examples helpful for your question.

1reaction
xiaoyaoyangcommented, May 23, 2022

@ydennisy yeah… #334 has relevant discussion: in your example, the matrix would look at this.

stereo hifi hifi cd plaer
bob 1 0 0 0
bob 0 1 0 0
alice 0 0 1 0
alice 0 0 0 1

Let’s call this 4*4 matrix M, and M(0,0) denotes the upper-left element, then:

  1. M(0,1) and M(0,2) are all zeros, this is considered a regularization effect
  2. the candidate of M(1,2) is the same as M(1,1) (same to M(2,2) and M(2,1) ), this is considered an accidental hit.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Overview Negative Sampling on Recommendation Systems
In the recommended system for negative sampling, there are three main research directions: Sampling Quality, Sampling Deviation, and Sampling ...
Read more >
How to get negative samples for reccomender system
In large enough volumes, training with negative examples (even if some of them are bad) yields better results. The trick is picking the...
Read more >
How do you handle positive and negative samples when they ...
The negative-sampling isn't done based on words elsewhere in the sentence, but the word-frequencies across the entire training corpus.
Read more >
How to Positively Answer Negative Interview Questions
Focus on Improvements. Always try to turn any negative into a positive by focusing on what you did to improve the situation. For...
Read more >
15 Ways to Give Negative Feedback, Positively (+ Examples)
Negative feedback can be hard to handle and, ... Answers to these questions will influence how positive and negative feedback are handled.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found