Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[ASK] Discovered a strange behavior on ranking metrics

See original GitHub issue

Description

I tried to evaluate some MF models (and UserKNN) with ranking metrics on the MovieLens dataset (100k). The results do not look as expected (first table), so I implemented the evaluation step with sklearn.metrics and the output looks much more realistic (second table). The implementation can be found here: https://gist.github.com/tpoerschke/1d823e1b9dbc0f290c763854e9fa2a52.

The implementation of my evaluation should be similar to that in Cornac. The metrics are evaluated per user and then averaged over all users.

Am I missing something here? Or is this a bug?

TEST:
...
        |  F1@-1 | Precision@-1 | Recall@-1 | Train (s) | Test (s)
------- + ------ + ------------ + --------- + --------- + --------
PMF     | 0.0143 |       0.0073 |    1.0000 |    6.4883 |   0.3277
NMF     | 0.0143 |       0.0073 |    1.0000 |    1.4425 |   0.4025
SVD     | 0.0143 |       0.0073 |    1.0000 |    0.4447 |   0.4037
UserKNN | 0.0143 |       0.0073 |    1.0000 |    0.1430 |   5.7097


CUSTOM EVALUATION
Model      F1 score    Precision    Recall    Train (s)
-------  ----------  -----------  --------  -----------
PMF          0.3498       0.3226    0.4506       6.8282
NMF          0.3243       0.3334    0.3707       1.4856
SVD          0.3423       0.3158    0.4398       0.3843
UserKNN      0.2628       0.2688    0.3319       0.0957

System

OS: macOS Catalina (10.15.7)
Python: 3.8.3 [Clang 10.0.0 ] :: Anaconda, Inc. on darwin

Other comments

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

tqtgcommented, Nov 28, 2020

@tpoerschke precision is capped by the number of ground-truth items. More recommendations will increase chance of getting correct items. However, the number of total predictions/recommendations (normalization) by the model also increased.

First of all, I think you should be clear about how those ranking metrics are used to evaluate top-k recommendations. The book I mentioned in the previous comment is one of the good references.

0reactions

tpoerschkecommented, Nov 28, 2020

@tqtg This will be the next step. If it works for the entire set, I’m going to limit it to a number of K. But shouldn’t the precision also be quite high when evaluating over the entire set? The value near zero just feels wrong.

Top Results From Across the Web

Weird behavior with site's rankings | SEO Forum | Moz

I have a problem with my site's rankings. I rank for higher difficulty (but lower search volume) keywords , but my site gets...

How User Behavior Affects SEO - Neil Patel

The aim is to show the page that Google perceives as most relevant to users at the top of the list. You'll notice...

User Behavior in SEO: A Ranking Factor or Not?

This algorithm is used to determine the true search intent behind unfamiliar and long-tail queries to provide searchers with the most relevant results....

The SEO Metrics Explained: Which Ones To Track & How To ...

Discover the tools you can use to measure them; Learn how to track SEO rankings. We cover both free and paid SEO metrics...

Hundreds of extreme self-citing scientists revealed in new ...

Vaidyanathan did not reply to Nature's request for comment, ... agrees that more metrics might not be the answer: “Ranking scientists is not ......