Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Interpreting prediction scores

See original GitHub issue

I built a recommendation model on a user-item transactional dataset where each transaction is represented by 1.

model = LightFM(learning_rate=0.05, loss='warp')
model.fit(train, epochs=NUM_EPOCHS, num_threads=NUM_THREADS)

Here are the results from the prediction experiment

Train auc score:  0.978294
Test auc score : 0.810757

Train precision at k=3:  0.115301
Test precision at k=3:  0.0209936


Train recall at k=3:  0.238312330233
Test recall at k=3:  0.0621618086561

Can anyone help me interpret this result? How is it that I am getting such good auc score and such bad precision/recall? The precision/recall gets even worse for ‘bpr’ Bayesian personalized ranking.

Prediction task

users = [0]
items = np.array([13433, 13434, 13435])
model.predict(users, item)

Result

array([-1.45337546, -1.39952552, -1.44265926])

How do I interpret the prediction scores?

Another problem is if I split the data into unequal train test datasets then I get the error

print(train.shape)
print(test.shape)

(88741, 18109) (29581, 18109)

test_auc = auc_score(model, test, train_interactions=train, num_threads=NUM_THREADS).mean() Error

/usr/local/lib/python3.5/site-packages/lightfm/lightfm.py in predict_rank(self, test_interactions, train_interactions, item_features, user_features, num_threads)
    658 
    659         if not user_features.shape[1] == self.user_embeddings.shape[0]:
--> 660             raise ValueError('Incorrect number of features in user_features')
    661 
    662         test_interactions = test_interactions.tocsr()

ValueError: Incorrect number of features in user_features

Thanks

Issue Analytics

State:
Created 7 years ago
Comments:12

Top GitHub Comments

3reactions

maciejkulacommented, Feb 1, 2017

That’s because the model estimates a latent representation for user 0. If you tried to use that representation for any user other than 0 the results may not be very good.
This depends on the training method. In general, negative examples are sampled from the set of items where no interaction was observed. You can read the BPR and WARP papers (linked from docs) for more details.
Biases encode mostly item popularity. It looks like in your data popularity is extremely important? In general I’d treat a result like this with extreme suspicion. Your estimation or evaluation may well be faulty.

1reaction

maciejkulacommented, Sep 20, 2017

It gives you the position of item X in a list where you ordered all items by how strongly they are recommended (so yes, 42 is ‘goodness’, and smaller numbers are better, because they are closer to the head of the list). Of course, if you are actually serving recommendations, you should use the predict method (which will give you a score where higher is better).

Recommendations are a ranking problem, not a classification problem, so things like thresholds don’t really make sense. The idea is that you have, say, 4 slots to fill in your interface, and you sort all products by their recommendation score and pick the top 4; you never set a threshold.

Reading the references is a good start. The BPR paper is a classic. There is also this but I can’t say anything about how good it is.

Top Results From Across the Web

4. Regression and Prediction - Practical Statistics for Data ...

Universities use regression to predict students' GPA based on their SAT scores. A regression model that fits the data well is set up...

Providing a Context for Interpreting Predictions of Job ...

Test scores are often used to predict levels of a criterion or outcome variable.

Making Predictions with Regression Analysis - Statistics By Jim

Learn how to use regression analysis to make predictions and determine whether they are both unbiased and precise.

Chapter 38 Predicting Criterion Scores Based on Selection ...

In this chapter, we will learn how to estimate a simple linear regression model and apply the model's equation to predict future scores...

Assessing the performance of prediction models - NCBI

The performance of prediction models can be assessed using a variety of different methods and metrics. Traditional measures for binary and survival outcomes ......