Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hybrid model have lower Precision@K compare to pure CF

See original GitHub issue

Hi Maciej,

I’m testing LightFM for my recommendation system in e-commerce of grocery products (everything you could buy in a convenient store). I’ve tested LightFM hybrid to pure collarborative filtering (also LightFM, just without users and item features) and got smaller preciesion@10. I’ve read your paper and it seems to point out that hybrid model should outperform pure CF model, but my experiment get the opposite results.

Here are the descriptions of my approach:

Dataset

Interaction matrix: 9915 x 17199; 98% sparsity (or ~2% density)
- Purchase data of 9915 users across 17199 items.
- All users have at least 1 transaction during the sample period
User features matrix: 9915 x 9930
- Including an identity matrix and 15 additional features on age, gender, geographic regions, etc
Item features matrix: 17199 x 21007
- Including an identity matrix and 3808 features based on brand name, categories, product descriptions

Implementation:

Train-Test data split by timestamp (because 1 user might re-purchase an item) then interaction matrix was built using lighfm.data.build_interaction()
Training and evaluation:

model = LightFM(loss='warp',
                no_components=80,
                item_alpha= 1e-7,
                learning_rate = 0.02,
                max_sampled = 50)

Hybrid model

model_hybird = model.fit(train, 
                      item_features=item_features,
                        user_features = user_features,
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_hybrid, test, item_features = item_features,
                                    user_features = user_features, num_threads = 4, k= 10).mean()

Pure CF model

model_simple = model.fit(train, 
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_simple, test, num_threads = 4, k= 10).mean()

Results: hybrid_precision@10 = 0.057814, pureCF_precision@10 = 0.070189. I’ve tried several things to try to increase the test precision of hybrid model including: Use weight matrix for training, optimize hyper parameter using grid search, normalize user features and item features, calculate weight for user features and item features with TFIDF. But so far the results always have pure CF outperform the hybrid model.

Would appreciate any advice on this. Thank you!

Issue Analytics

State:
Created 4 years ago
Comments:13

Top GitHub Comments

3reactions

maciejkulacommented, Mar 3, 2020

The culprit here might be that the embeddings for all features of a given item are simply summed to get the final item embedding: the model does not seem to be great at learning which features are important and which are not.

In a more flexible formulation you may want to concatenate the embeddings of different features to get your embedding vector. This should make it more straightforward for the model to simply discard some features.

Experimenting with different weights for different types of features might give you a lever to optimize this.

3reactions

FrancescoIcommented, Nov 26, 2019

@kientt15vinid, we’ve AB tested both the models online (a carousel of “similar items” to the main one in item page), using CTR % (click through rate: number of clicks / number of impressions) as primary KPIs.