Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why does the Retrieval class use an identify matrix for labels?

See original GitHub issue

Hi there,

Thanks so much for releasing and maintaining this code base. It’s really fantastic.

I don’t really have a bug to report, but I do have a question regarding the Retrieval class and how the loss function is being calculated. I’ve been working through the tutorials, focusing on the basic retrieval example and I understand that by default, the loss function uses a categorical cross entropy loss function.

Obviously, this implies having a label and predicted probabilities which I can see in the Retrieval class.

From that class, the scores are the matrix multiplication of the query and candidate embeddings:

 scores = tf.linalg.matmul(
        query_embeddings, candidate_embeddings, transpose_b=True)

Then, the labels are derived as:

labels = tf.eye(num_queries, num_candidates)

Which is then passed to tf.keras.losses.CategoricalCrossentropy to calculate the loss:

loss = self._loss(y_true=labels, y_pred=scores, sample_weight=sample_weight)

What I don’t quite understand is why is the identity matrix used as the labels? Doesn’t this imply that user_i has selected candidate_i? Or am I missing something?

If I relate this back to the basic retrieval example, then would the scores matrix be of size number_unique_user_ids x number_unique_movie_ids? Likewise, the rows of the loss matrix would relate to a user_id and the columns to a candidate. Wouldn’t this imply that user_1 reviewed candidate_1, etc…?

Apologies if this is a fairly basic question, but I’m quite new to Tensorflow. Would appreciate any feedback or references. I’ve tried looking at the issues here and also on stackoverflow, but haven’t really been able to find anything. Thanks.

Issue Analytics

State:
Created 2 years ago
Comments:11

Top GitHub Comments

4reactions

patrickorlandocommented, Sep 29, 2021

Just want to make sure I understand it correctly. Let’s say User had interacted with 2 candidates, the matrix is sparse and if I write down the interaction matrix, the row for that user will have two cell = 1 and the other = 0.

Agree.

However, when doing batch calculation, for that user, we will random pick one positive example for each batch (out of two positive cases) and thus the matrix will still be identity matrix…?

Not quite.

Let’s imagine you have 3 users and 4 items and that the positive interactions are as follows

row	user_id	item_id
1	1	3
2	1	4
3	2	1
4	2	3
5	3	2
6	3	4

As an interaction matrix you have

0  0  1  1
1  0  1  0
0  1  0  1

Assume a batch size of 2, and a query/candidate embedding of size 12. We randomly sample 2 rows from the table above, rows 1 and 5.

Your queries will be a matrix of shape (2, 12). The first row will the the query for user 1, and the second row for user 3. Your candidates will be a matrix of shape (2, 12). The first row will be for item 3, and the second for item 2.

When you perform the dot product between queries and candidates you get a score matrix of shape (2, 2). The rows are the users, and the columns are the items.

(u1,i3)  (u1, i2)
(u3,i3)  (u3, i2)

The diagonal of this matrix is the score for the positive interactions that we sampled. All the other elements are the scores for that query and the positive items for other examples, which we then use as negatives for that example

This matrix is not representative of the global interaction matrix. Consider we sample a batch of rows 1 and 4 instead. In this case, both users 1 and 2 interacted with item 3. The score matrix is then,

(u1,i3)  (u1, i3)
(u2,i3)  (u2, i3)

Here, the negative for each positive case is the same as the positive item. This is an accidental hit, and the tfrs library has the ability parameter to remove these hits if you pass the candidate ids in.

So in summary, each row of the scores matrix corresponds to is a single (user, item) pair. The diagonal is the score for that pair and all other columns are negatives sampled from the other pairs in that same mini-batch.

1reaction

maciejkulacommented, Nov 18, 2021

Thank you for the wonderful explanations, @patrickorlando!

Top Results From Across the Web

Label recovery and label correlation co-learning for multi- ...

In such a way, the recovery of incomplete label matrix and the learning of label correlations interact and boost each other to guide...

Crowd labeling latent Dirichlet allocation - PMC - NCBI

Crowd labeling is a crowdsourcing approach for gathering such labels ... unique class, then β̲ u is a scaled identity matrix with scaling ......

Data Classification

A classifier that assigns a class to a new point based on a separation hyperplane is called a ... The covariance matrix is...

Multi-Label Learning with Missing Labels - RPI ECSE

should have similar labels; class-level smoothness indicates ... to represent different sized identity matrices for clarity. The normalization term dX (i) =.

Identity Matrix

The letter I I is usually used to label identity matrices. We can use a subscript to indicate the size of a particular...