question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

model.predict() and embedding multiplication gave different results

See original GitHub issue

Hello! I get nearly the same results when I use native lighfm.LightFM.predict and embedding multiplication (up to 1e-6). I can’t understand why they are different. It affects following ranking functions. I read 617 and 474 but I couldn’t find the answer to my question.

import numpy as np
from numpy.testing import assert_allclose
import lightfm
from lightfm.datasets import fetch_movielens


data = fetch_movielens(min_rating=5.0)
model = lightfm.LightFM()
model.fit(interactions=data['train'], epochs=50)


def lfm_dot_product(uid, lfm_model):
    item_biases, item_factors = lfm_model.get_item_representations()
    user_biases, user_factors = lfm_model.get_user_representations()
    scores = user_factors[uid].dot(item_factors.T)
    scores += user_biases[uid]
    scores += item_biases
    return scores

testee_result = lfm_dot_product(0, model)
truth_result = model.predict(0, np.arange(data['train'].shape[1]))

assert testee_result.dtype == truth_result.dtype == np.float32
assert_allclose(testee_result, truth_result, rtol=0, atol=1e-7)

We can see almost the same results on output (when atol=1e-6 we pass the test)

AssertionError: 
Not equal to tolerance rtol=0, atol=1e-07

Mismatched elements: 1005 / 1682 (59.8%)
Max absolute difference: 1.4305115e-06
Max relative difference: 3.082396e-07
 x: array([5.848966, 3.664918, 3.890586, ..., 2.37985 , 2.37596 , 2.414136],
      dtype=float32)
 y: array([5.848967, 3.664918, 3.890586, ..., 2.37985 , 2.37596 , 2.414137],
      dtype=float32)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
viyxcommented, May 16, 2022

I think this is due to numpy fast computing libraries. I try numpy dot-product and manual product (similar to https://github.com/lyst/lightfm/blob/master/lightfm/_lightfm_fast.pyx.template#L320-L334) and I get different results too. So the problem is in numpy fast computing. Also I find out that float64 arrays don’t have the issue, but it’s not the case as we can’t change embedding type in lightfm model class.

0reactions
viyxcommented, May 18, 2022

because it affects ranking, top@k are different sometimes (sum(ranks_from_lighfm != ranks_from_numpy) / len(ranks_from_numpy)) ~ 0.01

Read more comments on GitHub >

github_iconTop Results From Across the Web

model.predict() and embedding multiplication gave different ...
I have trained a hybrid model. What is the best way to generate predictions for every item-user pair and rank them?
Read more >
Why Do I Get Different Results Each Time in Machine Learning?
The machine learning models may be different each time they are trained. In turn, the models may make different predictions, and when evaluated, ......
Read more >
Why does embedding vector multiplied by a constant in ...
Looking around it, I found this argument 1: The reason we increase the embedding values before the addition is to make the positional ......
Read more >
Understanding the Evaluation — pykeen 1.9.0 documentation
This part of the tutorial is aimed to help you understand the evaluation of knowledge graph embeddings. In particular it explains rank-based evaluation ......
Read more >
Using Embeddings to Make Complex Data Simple - Toptal
A typical machine learning model expects its features to be numbers, ... If you'd like to see what other embeddings are out there,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found