[Question]: How to handle negative samples?
See original GitHub issueHi @maciejkula thanks again for a great library!
I have another question which is a little theoretical, I would like to understand how to handle negative examples explicitly in this library. So as a dataset we provide positive rows (person x product_purchased) and the library handles selecting negative examples when training.
However in some domains it is important to provide explicit negative samples, for example in advertising we have (ad x web_page) and most of the time a cross happens it will create no positive interaction say a click. So to measure how well a specific ad performs on a specific page you need to know how many times it was shown, so you end up with:
ad. | page | clicks | impressions
shoes. | shopping.com. | 10. | 1000
From which we can calculate a click rate (CTR).
So my question is how best to handle such a dataset, where there is not just a positive interaction, but there is also data about how many times that positive interaction would have had a chance to form.
My basic idea:
- decide a threshold for a “good” CTR, and convert to binary labels
- add a weight per sample based on the number of impressions, more impressions would be a higher weight
- drop all negative examples from the data set
But I am sure there is a better way! Any help would be appreciated!
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5

Top Related StackOverflow Question
Hey @ydennisy, As far as I know, there is no way to handle explicit negative samples in the retrieval stage. My approach would be to train the retrieval model on only positive examples, and then train a separate ranking model containing all examples.
For your second question, you’ll probably find https://github.com/tensorflow/recommenders/issues/334#issuecomment-894873355 and the following examples helpful for your question.
@ydennisy yeah… #334 has relevant discussion: in your example, the matrix would look at this.
Let’s call this 4*4 matrix M, and M(0,0) denotes the upper-left element, then: