Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Majority of the top N recommended items for users are mostly similar

See original GitHub issue

This is my code:

import numpy as np
import pandas as pd
from lightfm import LightFM
from lightfm.data import Dataset


def make_model_for_recommendation_with_dataset():
    _dataset = Dataset()
    _data_pd = pd.read_csv("../ml-latest-small/ratings.csv")
    movies_pd = pd.read_csv('../ml-latest-small/movies.csv')
    _dataset.fit(users=_data_pd["userId"], items=_data_pd["movieId"])
    ratings_gp = _data_pd["rating"].apply(lambda x: x / 5.0)  #-> 0.0 >= weight >= 1.0
    interaction_list = (zip(_data_pd["userId"], _data_pd["movieId"], ratings_gp))
    interactions, weights = _dataset.build_interactions(interaction_list)
    model = LightFM(loss='warp')
    model.fit(interactions, num_threads=2)
    items_id_mapping = _dataset.mapping()[2]
    max_inner_id = max(items_id_mapping.values()) + 1
    item_labels = np.empty(max_inner_id, dtype=np.object)
    for movie_id, title in list(zip(movies_pd['movieId'], movies_pd['title'])):
        iid = int(movie_id)
        if iid in items_id_mapping:
            item_labels[items_id_mapping[iid]] = title
    return (model, {
        "train": interactions,
        "item_labels": item_labels,
        "mapping": _dataset.mapping()
    })


def sample_recommendation(_model, _data, user_ids):
    n_users, n_items = _data['train'].shape
    for user_id in user_ids:
        if 'mapping' in _data.keys():
            user_id_mapping, _, _, _ = _data['mapping']
            user_id = user_id_mapping[user_id]
        known_positives = _data['item_labels'][_data['train'].tocsr()[user_id].indices]
        scores = _model.predict(user_id, np.arange(n_items))
        top_items = _data['item_labels'][np.argsort(-scores)]
        print("User %s" % user_id)
        print("     Known positives:")
        for x in known_positives[:6]:
            print("        %s" % x)
        print("     Recommended:")
        for x in top_items[:6]:
            print("        %s" % x)


def recommendations():
    model, data = make_model_for_recommendation_with_dataset()
    while True:
        _user_id = int(input("Enter user id (0 for exit):\n").strip(" \n"))
        if _user_id == 0:
            break
        sample_recommendation(model, data, [_user_id])


recommendations()

But the majority of the top N recommended items for users are mostly similar. like this:

userId = 112
     Known positives:
        Pulp Fiction (1994)
        Twister (1996)
        Birdcage, The (1996)
        Willy Wonka & the Chocolate Factory (1971)
        Star Trek: First Contact (1996)
        Grumpier Old Men (1995)
     Recommended:
        Forrest Gump (1994)
        Star Wars: Episode IV - A New Hope (1977)
        Silence of the Lambs, The (1991)
        Jurassic Park (1993)
        Toy Story (1995)
        Schindler's List (1993)

userId = 113
     Known positives:
        GoldenEye (1995)
        Aristocats, The (1970)
        Happy Gilmore (1996)
        Get Shorty (1995)
        Dead Man Walking (1995)
        Jumanji (1995)
     Recommended:
        Forrest Gump (1994)
        Star Wars: Episode IV - A New Hope (1977)
        Jurassic Park (1993)
        Silence of the Lambs, The (1991)
        Toy Story (1995)
        Schindler's List (1993)

userId = 25
     Known positives:
        Beavis and Butt-Head Do America (1996)
        Trainspotting (1996)
        Star Wars: Episode IV - A New Hope (1977)
        Willy Wonka & the Chocolate Factory (1971)
        Star Trek: First Contact (1996)
        Grumpier Old Men (1995)
     Recommended:
        Forrest Gump (1994)
        Star Wars: Episode IV - A New Hope (1977)
        Silence of the Lambs, The (1991)
        Schindler's List (1993)
        Pulp Fiction (1994)
        Jurassic Park (1993)

Issue Analytics

State:
Created 3 years ago
Comments:6

Top GitHub Comments

3reactions

SimonCWcommented, Mar 4, 2021

@merrcury hard to say from here. I don’t have much experience with the MovieLens dataset. Have you compared your results with the example in the documentation to verify that your code is fine?

In general, popularity bias (over-emphasizing very popular items in the recommendations) is a common thing in recommender systems. One easy way that helped in my case was to “ignore” the item biases (you can think of them as “encoding popularity”) when scoring the model like so:

import numpy as np

recipe_biases = model.get_item_representations(features=None)[0]
zero_recipe_biases = np.zeros_like(recipe_biases)
model.item_biases = zero_recipe_biases

Afterwards if you call predict on the model, the scores won’t be influenced by the biases.

1reaction

shakeel-appyhighcommented, Mar 2, 2021

I have the same problem. My matrix size is Num users: 287823, num_items 56393 Gives the same recommendations for all users.

Top Results From Across the Web

Comparing different types of top-N recommendation engines ...

From user-user similarity (demographic filtering, collaborative filtering) to find similar customers, to item-item similarity (content-based, text-based, ...

Recommending items to more than a billion people

CF is based on the idea that the best recommendations come from people who have similar tastes. In other words, it uses historical...

Generating Top-N Items Recommendation Set Using ...

The main purpose of any recommendation system is to recommend items of users' interest. Mostly content and collaborative filtering are widely used ...

A Comparative Evaluation of Top-N Recommendation ...

This paper focuses on the first phase where various recommendation algorithms are evaluated in different settings, and compared to the non- ...

Introduction to recommender systems | by Baptiste Rocca

In this article, we will go through different paradigms of recommender systems. ... First, we consider the item this user liked the most...