Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Collaborative Filtering outperforming Hybrid

See original GitHub issue

2019-03-05 21:02:41,609 [MainThread  ] [INFO ]  Begin fitting collaborative filtering model...
2019-03-05 21:02:41,688 [MainThread  ] [INFO ]  Collaborative Filtering training set AUC: 0.93749386
2019-03-05 21:02:41,707 [MainThread  ] [INFO ]  Collaborative Filtering test set AUC: 0.9080546
2019-03-05 21:02:41,751 [MainThread  ] [INFO ]  Collaborative Filtering training set Precision@10: 0.5878049
2019-03-05 21:02:41,765 [MainThread  ] [INFO ]  Collaborative Filtering test set Precision@10: 0.103797466
2019-03-05 21:02:41,808 [MainThread  ] [INFO ]  Collaborative Filtering training set Recall@10: 0.15788174721859297
2019-03-05 21:02:41,822 [MainThread  ] [INFO ]  Collaborative Filtering test set Recall@10: 0.11484959933052774
2019-03-05 21:02:41,823 [MainThread  ] [INFO ]  Collaborative Filtering training set F1 Score: 0.24890794393000912
2019-03-05 21:02:41,823 [MainThread  ] [INFO ]  Collaborative Filtering test set F1 Score: 0.10904420177996955
2019-03-05 21:02:41,867 [MainThread  ] [INFO ]  Collaborative Filtering training set MRR: 0.8419941
2019-03-05 21:02:41,881 [MainThread  ] [INFO ]  Collaborative Filtering test set MRR: 0.23103695
2019-03-05 21:02:41,881 [MainThread  ] [INFO ]  Begin fitting hybrid model...
2019-03-05 21:02:45,425 [MainThread  ] [INFO ]  Hybrid training set AUC: 0.89809555
2019-03-05 21:02:45,791 [MainThread  ] [INFO ]  Hybrid test set AUC: 0.88973016
2019-03-05 21:02:46,370 [MainThread  ] [INFO ]  Hybrid training set Precision@10: 0.41646343
2019-03-05 21:02:46,773 [MainThread  ] [INFO ]  Hybrid test set Precision@10: 0.09050632
2019-03-05 21:02:47,336 [MainThread  ] [INFO ]  Hybrid training set Recall@10: 0.07391322142347932
2019-03-05 21:02:47,719 [MainThread  ] [INFO ]  Hybrid test set Recall@10: 0.06227174211311939
2019-03-05 21:02:47,719 [MainThread  ] [INFO ]  Hybrid training set F1 Score: 0.12554494052412787
2019-03-05 21:02:47,719 [MainThread  ] [INFO ]  Hybrid test set F1 Score: 0.07378004680470172
2019-03-05 21:02:48,303 [MainThread  ] [INFO ]  Hybrid training set MRR: 0.6455854
2019-03-05 21:02:48,689 [MainThread  ] [INFO ]  Hybrid test set MRR: 0.2648911

Dataset format: Data JSON

My interactions here are between users and symbols, with item metadata pertaining to each symbol: item_sector (around 9 in total), item_industry (around 210 in total) and other metadata like trending score and watchlist count which I’ve both normalised.

I’m wondering what could possibly cause the CF model to outperform the hybrid? I thought it may be too many item features so I removed trending score and watchlist count and left just item industry and sector to further classify/group each symbol.

Issue Analytics

State:
Created 5 years ago
Comments:7

Top GitHub Comments

3reactions

maciejkulacommented, Jun 11, 2019

(number_items x [number_items + number_features]) is correct.

On Sat, 8 Jun 2019 at 22:06, DaStapo notifications@github.com wrote:

@SimonCW https://github.com/SimonCW are you saying that the shape has to have exactly the same dimensions, both equal to the number of items? In my case, the returned matrix from build_item_features() has the shape 2035x2038, is that not okay, since the dimensions are not equal? The dimensions seem to be equal to (number_items x number_items+number_features), because I’m using 2035 items where each has 3 features.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lyst/lightfm/issues/430?email_source=notifications&email_token=AASIEA5CFWH7QSLEMG4AFATPZSFXBA5CNFSM4G4E2TPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXIDSNA#issuecomment-500185396, or mute the thread https://github.com/notifications/unsubscribe-auth/AASIEA5XPL5MVGWAIMYUGXTPZSFXBANCNFSM4G4E2TPA .

3reactions

SimonCWcommented, Mar 14, 2019

Does your item_features matrix (that you provide to the fit function) include an identity matrix of shape (number_items x number_items)? If not, it might well be that your model is less expressive than pure collaborative filtering. I would recommend to build your item features with the build_item_features method from the Dataset class (http://lyst.github.io/lightfm/docs/lightfm.data.html) where the default option is to include the item identity matrix.

Top Results From Across the Web

A Hybrid Approach using Collaborative filtering and Content ...

In this paper a mixed approach has been used such that both the algorithms complement each other thereby improving performance and accuracy to...

Collaborative Filtering vs. Hybrid Recommender System with ...

We are trying to do an experiment to compare both collaborative and hybrid recommender systems and which performs better. In this experiment we...

Comparing Collaborative Filtering and Hybrid based ...

The performance comparisons show that the Collaborative Filtering based approach always outperforms the Hybrid based at any top-N position in Precision and ...

A Hybrid Collaborative Filtering Model with Deep Structure for ...

Extensive experimental results on three real-world datasets show that our hybrid model outperforms other methods in effectively utilizing side information and ...

Hybrid Collaborative Filtering Methods for Recommending ...

These experiments demonstrate that our model outperforms state-of-the-art baseline methods for top-N search term recommendation on different ...