question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What is the right way (format) to input user features to the build_user_features function?

See original GitHub issue

I have been struggling for a while now trying to input my user features into lightfm to try to include them in my recommendation model. I have read many other issues refering to similar problems, but I still can’t manage to work solve my problem.

I have my data in a pandas dataframe. My user IDs are strings like "AHS-1", and I’m trying to include one user feature to start. For what I’ve read, the format in input user features into the build_user_features function is:

[user_id1, [user_feature1, user_feature2], [user_id2, [user_feature1, user_feature2] …]

I’ve tried many options to create this, the latest being creating a dataframe with the user IDs and the feature, then converting this into a tuple, but I get the error “TypeError: ‘int’ object is not iterable” since my feature is an int.

Here’s my code:

user_features_pd=pd.concat([user_data['mber_id'],user_data.iloc[:,5].astype(int)], axis=1) 
tuples = [tuple(x) for x in user_features_pd.values]
user_features = dataset.build_user_features((tuples))

I would appreciate very much any help!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

4reactions
Med-ELOMARIcommented, Oct 18, 2019

well , the error explain itself (Feature 62 not in eature mapping. Call fit first) , i think you used just the function that i mentioned . i recommend you to use the whole class , that will fit every unique existing feature before building it .

let’s pic a super 😄 small example , here we have this interactions :

item_X item_Y item_Z
user_A 0 5 1
user_B 1
user_C 5

with this users details :

user_A user_feat1 user_feat2
user_B user_feat3 user_feat4 user_feat2
user_C user_feat1 user_feat4

and items details :

item_X item_feat1
item_Y item_feat2 item_feat3 item_feat4
item_Z item_feat1 item_feat3 item_feat4

so the right way or format to present the data is :

interactions = [
    ("user_A", "item_X", 0),
    ("user_A", "item_Y", 5),
    ("user_A", "item_Z", 1),
    ("user_B", "item_X", 1),
    ("user_C", "item_Y", 5),
]
users_features = (
    ["user_A", ["user_feat1", "user_feat2"]],
    ["user_B", ["user_feat3", "user_feat4", "user_feat2"]],
    ["user_C", ["user_feat1", "user_feat4"]],
)
items_features = (
    ["item_X", ["item_feat1"]],
    ["item_Y", ["item_feat2", "item_feat3", "item_feat4"]],
    ["item_Z", ["item_feat1", "item_feat3", "item_feat4"]]
)

first we create a Dataset instance

from lightfm.data import Dataset
dataset = Dataset()

then we must fit all the data you have , users , items , all features this step is essential because the LightFM model understand numbers not strings , so we will need to map each string we have to a number , and that’s what dataset.fit do

dataset.fit(
    users=["user_A", "user_B", "user_C"],
    items=["item_X", "item_Y", "item_Z"],
    item_features=["item_feat1", "item_feat2", "item_feat3", "item_feat4"],
    user_features=["user_feat1", "user_feat2", "user_feat3", "user_feat4"],
)

you can see the feature_mapping is a dict of mappings for our data image

then we can build the rest (assuming everything we will add here is already fitted (Mapped) in the last step) :

(interactions, weights) = dataset.build_interactions(interactions)
user_features_list = dataset.build_user_features(users_features)
item_features_list = dataset.build_item_features(items_features)

then we can feed our model

model = LightFM(no_components=24, loss="warp", k=15)
model.fit(
    interactions=interactions,
    sample_weight=weights,
    item_features=item_features_list,
    user_features=user_features_list,
    verbose=True,
    epochs=10,
    num_threads=20,
)

to predict , we give the mappings as input , not the strings (user_A …)

print(model.predict(1, list(range(3))))

Results [-0.15689075 0.10851561 -0.19980735]

hope i makes it clear now , ask if not 😵 good luck 🍀

1reaction
clementechiucommented, Oct 22, 2019

Ok thank you very much @Med-ELOMARI ! I think my problem is probably that I was inputting my continous variables wrong (not as a dictionary with weights as suggested in an answer of the issue #433). I tried splitting it into chunks as you suggested and it works. Thank you very much for your help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Forms in HTML documents - W3C
This control type allows the user to select files so that their contents may be submitted with a form. The INPUT element is...
Read more >
User Story Examples in Product Development - ProductPlan
In agile software development, a user story is a brief, plain-language explanation of a feature or functionality written from a user's point of...
Read more >
What Are User Personas? How to Create Personas in 4 Steps
How do you create user personas without leaving your desk or even using Google Analytics? Here's a 4-step method to use, with a...
Read more >
Basic Input, Output, and String Formatting in Python
In this step-by-step Python tutorial, you'll learn how to take user input from the keyboard with the built-in function input(), how to display...
Read more >
What is END-To-END Testing? E2E Example - Guru99
END-TO-END TESTING is a type of Software Testing that validates the software ... Build user functions; Build Conditions; Build Test Cases.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found