Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question]: How to handle negative samples?

See original GitHub issue

Hi @maciejkula thanks again for a great library!

I have another question which is a little theoretical, I would like to understand how to handle negative examples explicitly in this library. So as a dataset we provide positive rows (person x product_purchased) and the library handles selecting negative examples when training.

However in some domains it is important to provide explicit negative samples, for example in advertising we have (ad x web_page) and most of the time a cross happens it will create no positive interaction say a click. So to measure how well a specific ad performs on a specific page you need to know how many times it was shown, so you end up with:

ad.        | page                  | clicks       | impressions
shoes.  | shopping.com.  | 10.           | 1000

From which we can calculate a click rate (CTR).

So my question is how best to handle such a dataset, where there is not just a positive interaction, but there is also data about how many times that positive interaction would have had a chance to form.

My basic idea:

decide a threshold for a “good” CTR, and convert to binary labels
add a weight per sample based on the number of impressions, more impressions would be a higher weight
drop all negative examples from the data set

But I am sure there is a better way! Any help would be appreciated!

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:5

Top GitHub Comments

2reactions

patrickorlandocommented, Apr 28, 2022

Hey @ydennisy, As far as I know, there is no way to handle explicit negative samples in the retrieval stage. My approach would be to train the retrieval model on only positive examples, and then train a separate ranking model containing all examples.

For your second question, you’ll probably find https://github.com/tensorflow/recommenders/issues/334#issuecomment-894873355 and the following examples helpful for your question.

1reaction

xiaoyaoyangcommented, May 23, 2022

@ydennisy yeah… #334 has relevant discussion: in your example, the matrix would look at this.

	stereo	hifi	hifi	cd plaer
bob	1	0	0	0
bob	0	1	0	0
alice	0	0	1	0
alice	0	0	0	1

Let’s call this 4*4 matrix M, and M(0,0) denotes the upper-left element, then:

M(0,1) and M(0,2) are all zeros, this is considered a regularization effect
the candidate of M(1,2) is the same as M(1,1) (same to M(2,2) and M(2,1) ), this is considered an accidental hit.