question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hybrid model have lower Precision@K compare to pure CF

See original GitHub issue

Hi Maciej,

I’m testing LightFM for my recommendation system in e-commerce of grocery products (everything you could buy in a convenient store). I’ve tested LightFM hybrid to pure collarborative filtering (also LightFM, just without users and item features) and got smaller preciesion@10. I’ve read your paper and it seems to point out that hybrid model should outperform pure CF model, but my experiment get the opposite results.

Here are the descriptions of my approach:

Dataset

  • Interaction matrix: 9915 x 17199; 98% sparsity (or ~2% density)

    • Purchase data of 9915 users across 17199 items.
    • All users have at least 1 transaction during the sample period
  • User features matrix: 9915 x 9930

    • Including an identity matrix and 15 additional features on age, gender, geographic regions, etc
  • Item features matrix: 17199 x 21007

    • Including an identity matrix and 3808 features based on brand name, categories, product descriptions

Implementation:

  1. Train-Test data split by timestamp (because 1 user might re-purchase an item) then interaction matrix was built using lighfm.data.build_interaction()
  2. Training and evaluation:
model = LightFM(loss='warp',
                no_components=80,
                item_alpha= 1e-7,
                learning_rate = 0.02,
                max_sampled = 50)
  • Hybrid model
model_hybird = model.fit(train, 
                      item_features=item_features,
                        user_features = user_features,
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_hybrid, test, item_features = item_features,
                                    user_features = user_features, num_threads = 4, k= 10).mean()
  • Pure CF model
model_simple = model.fit(train, 
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_simple, test, num_threads = 4, k= 10).mean()
  1. Results: hybrid_precision@10 = 0.057814, pureCF_precision@10 = 0.070189. I’ve tried several things to try to increase the test precision of hybrid model including: Use weight matrix for training, optimize hyper parameter using grid search, normalize user features and item features, calculate weight for user features and item features with TFIDF. But so far the results always have pure CF outperform the hybrid model.

Would appreciate any advice on this. Thank you!

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:13

github_iconTop GitHub Comments

3reactions
maciejkulacommented, Mar 3, 2020

The culprit here might be that the embeddings for all features of a given item are simply summed to get the final item embedding: the model does not seem to be great at learning which features are important and which are not.

In a more flexible formulation you may want to concatenate the embeddings of different features to get your embedding vector. This should make it more straightforward for the model to simply discard some features.

Experimenting with different weights for different types of features might give you a lever to optimize this.

3reactions
FrancescoIcommented, Nov 26, 2019

@kientt15vinid, we’ve AB tested both the models online (a carousel of “similar items” to the main one in item page), using CTR % (click through rate: number of clicks / number of impressions) as primary KPIs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Item cold-start: recommending StackExchange questions
A hybrid model​​ As before, let's sanity check the model on the training set. Note that the training set AUC is lower than...
Read more >
Hybrid Modeling and Intensified DoE: An Approach to ...
The fractional-factorial hybrid model demonstrates inferior accuracy and precision compared to the intensified approach.
Read more >
Hybrid Collaborative Filtering Algorithms Using a Mixture of ...
Collaborative filtering (CF) is one of the most successful approaches for recommendation. In this paper, we propose two hybrid CF algorithms,.
Read more >
Personalized Restaurant Recommender System Using A ...
As we can see, WARP performs best both in a pure CF setting as well as a hybrid setting. Logistic loss seems to...
Read more >
Decision making based on hybrid modeling approach applied ...
Hybrid modeling has become an attractive and sometimes necessary ... is degrading compared to the pure polymer, CM- k CF , a partial...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found