Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question]: Using other metrics such as `AUC`.

See original GitHub issue

In much of the literature / guides outside of this project AUC seems to be a popular metric for recommender systems. TF and Keras specifically has an implementation here.

However I am not sure on:

is this a valid metric to use for retrieval when using this project?
what hyperparams would be important to consider?
would the use of this as batch_metric make sense?

Thanks in advance!

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:15

Top GitHub Comments

3reactions

patrickorlandocommented, May 11, 2022

Just the positives for that batch @rlcauvin. In practice sampling a negative that is a positive in another batch doesn’t affect performance and provides some mild regularisation.

You have candidate_ids of shape (batch_size, 1), and scores of shape (batch_size, batch_size). Essentially we are just creating a mask where tf.cast(candidate_ids == tf.transpose(candidate_ids), dtype=tf.float32) - tf.identity(batch_size). e.g.,

candidate_ids = [[0,1,2,0,3,4,3,5]]
mask = [
  [0, 0, 0, 1, 0, 0, 0, 0], # 0
  [0, 0, 0, 0, 0, 0, 0, 0], # 1
  [0, 0, 0, 0, 0, 0, 0, 0], # 2
  [1, 0, 0, 0, 0, 0, 0, 0], # 0 
  [0, 0, 0, 0, 0, 0, 1, 0], # 3
  [0, 0, 0, 0, 0, 0, 0, 0], # 4
  [0, 0, 0, 0, 1, 0, 0, 0], # 3
  [0, 0, 0, 0, 0, 0, 0, 0]  # 5
]

2reactions

patrickorlandocommented, May 16, 2022

Hey @rlcauvin, I would start with

just using user IDs and item IDs

The model should learn based on just this.

When you experimented with the ranking model, is everything else in your code kept the same? Same lookup layers, embedding layers, tf.data pipelines?

I would:

Ensure that the lookups are working as expected, take a few examples and manually pass them through. Are any items being mapped to the [UNK] token 1? Is the shape correct, they should be only 1 dimensional, (batch_size,).
Pass them through the embedding layers. is each row different? Is the shape correct (batch_size, n_dim)
Do the matrix multiplication, are the scores different? do you get a shape that is (batch_size, batch_size)?

The shape is important, because if the query and candidate tensors have an extra dimension, the matrix multiplication will produce an incorrect result. Your loss will decrease but your model will be junk.

Top Results From Across the Web

Which Metric Should I Use? Accuracy vs. AUC - KDnuggets

Accuracy and AUC (Area Under the Curve) are measures to evaluate the goodness of model performance. Both are helpful to gauge the model ......

F1 Score vs ROC AUC vs Accuracy vs PR AUC - Neptune.ai

PR AUC and F1 Score are very robust evaluation metrics that work great for many classification problems but from my experience more commonly ......

Understanding AUC - ROC Curve | by Sarang Narkhede

ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of...

Compare classifiers based on AUROC or accuracy?

Another interesting paper with metrics for assertion of model performance is Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & ...

8 Unique Machine Learning Questions on Performance ...

This makes it quite essential to have a look at other performance metrics such as ROC Curves, AUC, Precision, and Recall.