Majority of the top N recommended items for users are mostly similar
See original GitHub issueThis is my code:
import numpy as np
import pandas as pd
from lightfm import LightFM
from lightfm.data import Dataset
def make_model_for_recommendation_with_dataset():
_dataset = Dataset()
_data_pd = pd.read_csv("../ml-latest-small/ratings.csv")
movies_pd = pd.read_csv('../ml-latest-small/movies.csv')
_dataset.fit(users=_data_pd["userId"], items=_data_pd["movieId"])
ratings_gp = _data_pd["rating"].apply(lambda x: x / 5.0) #-> 0.0 >= weight >= 1.0
interaction_list = (zip(_data_pd["userId"], _data_pd["movieId"], ratings_gp))
interactions, weights = _dataset.build_interactions(interaction_list)
model = LightFM(loss='warp')
model.fit(interactions, num_threads=2)
items_id_mapping = _dataset.mapping()[2]
max_inner_id = max(items_id_mapping.values()) + 1
item_labels = np.empty(max_inner_id, dtype=np.object)
for movie_id, title in list(zip(movies_pd['movieId'], movies_pd['title'])):
iid = int(movie_id)
if iid in items_id_mapping:
item_labels[items_id_mapping[iid]] = title
return (model, {
"train": interactions,
"item_labels": item_labels,
"mapping": _dataset.mapping()
})
def sample_recommendation(_model, _data, user_ids):
n_users, n_items = _data['train'].shape
for user_id in user_ids:
if 'mapping' in _data.keys():
user_id_mapping, _, _, _ = _data['mapping']
user_id = user_id_mapping[user_id]
known_positives = _data['item_labels'][_data['train'].tocsr()[user_id].indices]
scores = _model.predict(user_id, np.arange(n_items))
top_items = _data['item_labels'][np.argsort(-scores)]
print("User %s" % user_id)
print(" Known positives:")
for x in known_positives[:6]:
print(" %s" % x)
print(" Recommended:")
for x in top_items[:6]:
print(" %s" % x)
def recommendations():
model, data = make_model_for_recommendation_with_dataset()
while True:
_user_id = int(input("Enter user id (0 for exit):\n").strip(" \n"))
if _user_id == 0:
break
sample_recommendation(model, data, [_user_id])
recommendations()
But the majority of the top N recommended items for users are mostly similar. like this:
userId = 112
Known positives:
Pulp Fiction (1994)
Twister (1996)
Birdcage, The (1996)
Willy Wonka & the Chocolate Factory (1971)
Star Trek: First Contact (1996)
Grumpier Old Men (1995)
Recommended:
Forrest Gump (1994)
Star Wars: Episode IV - A New Hope (1977)
Silence of the Lambs, The (1991)
Jurassic Park (1993)
Toy Story (1995)
Schindler's List (1993)
userId = 113
Known positives:
GoldenEye (1995)
Aristocats, The (1970)
Happy Gilmore (1996)
Get Shorty (1995)
Dead Man Walking (1995)
Jumanji (1995)
Recommended:
Forrest Gump (1994)
Star Wars: Episode IV - A New Hope (1977)
Jurassic Park (1993)
Silence of the Lambs, The (1991)
Toy Story (1995)
Schindler's List (1993)
userId = 25
Known positives:
Beavis and Butt-Head Do America (1996)
Trainspotting (1996)
Star Wars: Episode IV - A New Hope (1977)
Willy Wonka & the Chocolate Factory (1971)
Star Trek: First Contact (1996)
Grumpier Old Men (1995)
Recommended:
Forrest Gump (1994)
Star Wars: Episode IV - A New Hope (1977)
Silence of the Lambs, The (1991)
Schindler's List (1993)
Pulp Fiction (1994)
Jurassic Park (1993)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6
Top Results From Across the Web
Comparing different types of top-N recommendation engines ...
From user-user similarity (demographic filtering, collaborative filtering) to find similar customers, to item-item similarity (content-based, text-based, ...
Read more >Recommending items to more than a billion people
CF is based on the idea that the best recommendations come from people who have similar tastes. In other words, it uses historical...
Read more >Generating Top-N Items Recommendation Set Using ...
The main purpose of any recommendation system is to recommend items of users' interest. Mostly content and collaborative filtering are widely used ...
Read more >A Comparative Evaluation of Top-N Recommendation ...
This paper focuses on the first phase where various recommendation algorithms are evaluated in different settings, and compared to the non- ...
Read more >Introduction to recommender systems | by Baptiste Rocca
In this article, we will go through different paradigms of recommender systems. ... First, we consider the item this user liked the most...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@merrcury hard to say from here. I don’t have much experience with the MovieLens dataset. Have you compared your results with the example in the documentation to verify that your code is fine?
In general, popularity bias (over-emphasizing very popular items in the recommendations) is a common thing in recommender systems. One easy way that helped in my case was to “ignore” the item biases (you can think of them as “encoding popularity”) when scoring the model like so:
Afterwards if you call predict on the model, the scores won’t be influenced by the biases.
I have the same problem. My matrix size is
Num users: 287823, num_items 56393
Gives the same recommendations for all users.