Collaborative Filtering outperforming Hybrid
See original GitHub issue2019-03-05 21:02:41,609 [MainThread ] [INFO ] Begin fitting collaborative filtering model...
2019-03-05 21:02:41,688 [MainThread ] [INFO ] Collaborative Filtering training set AUC: 0.93749386
2019-03-05 21:02:41,707 [MainThread ] [INFO ] Collaborative Filtering test set AUC: 0.9080546
2019-03-05 21:02:41,751 [MainThread ] [INFO ] Collaborative Filtering training set Precision@10: 0.5878049
2019-03-05 21:02:41,765 [MainThread ] [INFO ] Collaborative Filtering test set Precision@10: 0.103797466
2019-03-05 21:02:41,808 [MainThread ] [INFO ] Collaborative Filtering training set Recall@10: 0.15788174721859297
2019-03-05 21:02:41,822 [MainThread ] [INFO ] Collaborative Filtering test set Recall@10: 0.11484959933052774
2019-03-05 21:02:41,823 [MainThread ] [INFO ] Collaborative Filtering training set F1 Score: 0.24890794393000912
2019-03-05 21:02:41,823 [MainThread ] [INFO ] Collaborative Filtering test set F1 Score: 0.10904420177996955
2019-03-05 21:02:41,867 [MainThread ] [INFO ] Collaborative Filtering training set MRR: 0.8419941
2019-03-05 21:02:41,881 [MainThread ] [INFO ] Collaborative Filtering test set MRR: 0.23103695
2019-03-05 21:02:41,881 [MainThread ] [INFO ] Begin fitting hybrid model...
2019-03-05 21:02:45,425 [MainThread ] [INFO ] Hybrid training set AUC: 0.89809555
2019-03-05 21:02:45,791 [MainThread ] [INFO ] Hybrid test set AUC: 0.88973016
2019-03-05 21:02:46,370 [MainThread ] [INFO ] Hybrid training set Precision@10: 0.41646343
2019-03-05 21:02:46,773 [MainThread ] [INFO ] Hybrid test set Precision@10: 0.09050632
2019-03-05 21:02:47,336 [MainThread ] [INFO ] Hybrid training set Recall@10: 0.07391322142347932
2019-03-05 21:02:47,719 [MainThread ] [INFO ] Hybrid test set Recall@10: 0.06227174211311939
2019-03-05 21:02:47,719 [MainThread ] [INFO ] Hybrid training set F1 Score: 0.12554494052412787
2019-03-05 21:02:47,719 [MainThread ] [INFO ] Hybrid test set F1 Score: 0.07378004680470172
2019-03-05 21:02:48,303 [MainThread ] [INFO ] Hybrid training set MRR: 0.6455854
2019-03-05 21:02:48,689 [MainThread ] [INFO ] Hybrid test set MRR: 0.2648911
Dataset format: Data JSON
My interactions here are between users and symbols, with item metadata pertaining to each symbol: item_sector (around 9 in total), item_industry (around 210 in total) and other metadata like trending score and watchlist count which I’ve both normalised.
I’m wondering what could possibly cause the CF model to outperform the hybrid? I thought it may be too many item features so I removed trending score and watchlist count and left just item industry and sector to further classify/group each symbol.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7
Top Results From Across the Web
A Hybrid Approach using Collaborative filtering and Content ...
In this paper a mixed approach has been used such that both the algorithms complement each other thereby improving performance and accuracy to...
Read more >Collaborative Filtering vs. Hybrid Recommender System with ...
We are trying to do an experiment to compare both collaborative and hybrid recommender systems and which performs better. In this experiment we...
Read more >Comparing Collaborative Filtering and Hybrid based ...
The performance comparisons show that the Collaborative Filtering based approach always outperforms the Hybrid based at any top-N position in Precision and ...
Read more >A Hybrid Collaborative Filtering Model with Deep Structure for ...
Extensive experimental results on three real-world datasets show that our hybrid model outperforms other methods in effectively utilizing side information and ...
Read more >Hybrid Collaborative Filtering Methods for Recommending ...
These experiments demonstrate that our model outperforms state-of-the-art baseline methods for top-N search term recommendation on different ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
(number_items x [number_items + number_features]) is correct.
On Sat, 8 Jun 2019 at 22:06, DaStapo notifications@github.com wrote:
Does your item_features matrix (that you provide to the fit function) include an identity matrix of shape (number_items x number_items)? If not, it might well be that your model is less expressive than pure collaborative filtering. I would recommend to build your item features with the
build_item_features
method from theDataset
class (http://lyst.github.io/lightfm/docs/lightfm.data.html) where the default option is to include the item identity matrix.