Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

model.predict() and embedding multiplication gave different results

See original GitHub issue

Hello! I get nearly the same results when I use native lighfm.LightFM.predict and embedding multiplication (up to 1e-6). I can’t understand why they are different. It affects following ranking functions. I read 617 and 474 but I couldn’t find the answer to my question.

import numpy as np
from numpy.testing import assert_allclose
import lightfm
from lightfm.datasets import fetch_movielens


data = fetch_movielens(min_rating=5.0)
model = lightfm.LightFM()
model.fit(interactions=data['train'], epochs=50)


def lfm_dot_product(uid, lfm_model):
    item_biases, item_factors = lfm_model.get_item_representations()
    user_biases, user_factors = lfm_model.get_user_representations()
    scores = user_factors[uid].dot(item_factors.T)
    scores += user_biases[uid]
    scores += item_biases
    return scores

testee_result = lfm_dot_product(0, model)
truth_result = model.predict(0, np.arange(data['train'].shape[1]))

assert testee_result.dtype == truth_result.dtype == np.float32
assert_allclose(testee_result, truth_result, rtol=0, atol=1e-7)

We can see almost the same results on output (when atol=1e-6 we pass the test)

AssertionError: 
Not equal to tolerance rtol=0, atol=1e-07

Mismatched elements: 1005 / 1682 (59.8%)
Max absolute difference: 1.4305115e-06
Max relative difference: 3.082396e-07
 x: array([5.848966, 3.664918, 3.890586, ..., 2.37985 , 2.37596 , 2.414136],
      dtype=float32)
 y: array([5.848967, 3.664918, 3.890586, ..., 2.37985 , 2.37596 , 2.414137],
      dtype=float32)

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

1reaction

viyxcommented, May 16, 2022

I think this is due to numpy fast computing libraries. I try numpy dot-product and manual product (similar to https://github.com/lyst/lightfm/blob/master/lightfm/_lightfm_fast.pyx.template#L320-L334) and I get different results too. So the problem is in numpy fast computing. Also I find out that float64 arrays don’t have the issue, but it’s not the case as we can’t change embedding type in lightfm model class.

0reactions

viyxcommented, May 18, 2022

because it affects ranking, top@k are different sometimes (sum(ranks_from_lighfm != ranks_from_numpy) / len(ranks_from_numpy)) ~ 0.01

Top Results From Across the Web

model.predict() and embedding multiplication gave different ...

I have trained a hybrid model. What is the best way to generate predictions for every item-user pair and rank them?

Why Do I Get Different Results Each Time in Machine Learning?

The machine learning models may be different each time they are trained. In turn, the models may make different predictions, and when evaluated, ......

Why does embedding vector multiplied by a constant in ...

Looking around it, I found this argument 1: The reason we increase the embedding values before the addition is to make the positional ......

Understanding the Evaluation — pykeen 1.9.0 documentation

This part of the tutorial is aimed to help you understand the evaluation of knowledge graph embeddings. In particular it explains rank-based evaluation ......

Using Embeddings to Make Complex Data Simple - Toptal

A typical machine learning model expects its features to be numbers, ... If you'd like to see what other embeddings are out there,...