Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What is the right way (format) to input user features to the build_user_features function?

See original GitHub issue

I have been struggling for a while now trying to input my user features into lightfm to try to include them in my recommendation model. I have read many other issues refering to similar problems, but I still can’t manage to work solve my problem.

I have my data in a pandas dataframe. My user IDs are strings like "AHS-1", and I’m trying to include one user feature to start. For what I’ve read, the format in input user features into the build_user_features function is:

[user_id1, [user_feature1, user_feature2], [user_id2, [user_feature1, user_feature2] …]

I’ve tried many options to create this, the latest being creating a dataframe with the user IDs and the feature, then converting this into a tuple, but I get the error “TypeError: ‘int’ object is not iterable” since my feature is an int.

Here’s my code:

user_features_pd=pd.concat([user_data['mber_id'],user_data.iloc[:,5].astype(int)], axis=1) 
tuples = [tuple(x) for x in user_features_pd.values]
user_features = dataset.build_user_features((tuples))

I would appreciate very much any help!

Issue Analytics

State:
Created 4 years ago
Comments:6

Top GitHub Comments

4reactions

Med-ELOMARIcommented, Oct 18, 2019

well , the error explain itself (Feature 62 not in eature mapping. Call fit first) , i think you used just the function that i mentioned . i recommend you to use the whole class , that will fit every unique existing feature before building it .

let’s pic a super 😄 small example , here we have this interactions :

	item_X	item_Y	item_Z
user_A	0	5	1
user_B	1
user_C		5

with this users details :


user_A	user_feat1	user_feat2
user_B	user_feat3	user_feat4	user_feat2
user_C	user_feat1	user_feat4

and items details :


item_X	item_feat1
item_Y	item_feat2	item_feat3	item_feat4
item_Z	item_feat1	item_feat3	item_feat4

so the right way or format to present the data is :

interactions = [
    ("user_A", "item_X", 0),
    ("user_A", "item_Y", 5),
    ("user_A", "item_Z", 1),
    ("user_B", "item_X", 1),
    ("user_C", "item_Y", 5),
]
users_features = (
    ["user_A", ["user_feat1", "user_feat2"]],
    ["user_B", ["user_feat3", "user_feat4", "user_feat2"]],
    ["user_C", ["user_feat1", "user_feat4"]],
)
items_features = (
    ["item_X", ["item_feat1"]],
    ["item_Y", ["item_feat2", "item_feat3", "item_feat4"]],
    ["item_Z", ["item_feat1", "item_feat3", "item_feat4"]]
)

first we create a Dataset instance

from lightfm.data import Dataset
dataset = Dataset()

then we must fit all the data you have , users , items , all features this step is essential because the LightFM model understand numbers not strings , so we will need to map each string we have to a number , and that’s what dataset.fit do

dataset.fit(
    users=["user_A", "user_B", "user_C"],
    items=["item_X", "item_Y", "item_Z"],
    item_features=["item_feat1", "item_feat2", "item_feat3", "item_feat4"],
    user_features=["user_feat1", "user_feat2", "user_feat3", "user_feat4"],
)

you can see the feature_mapping is a dict of mappings for our data

then we can build the rest (assuming everything we will add here is already fitted (Mapped) in the last step) :

(interactions, weights) = dataset.build_interactions(interactions)
user_features_list = dataset.build_user_features(users_features)
item_features_list = dataset.build_item_features(items_features)

then we can feed our model

model = LightFM(no_components=24, loss="warp", k=15)
model.fit(
    interactions=interactions,
    sample_weight=weights,
    item_features=item_features_list,
    user_features=user_features_list,
    verbose=True,
    epochs=10,
    num_threads=20,
)

to predict , we give the mappings as input , not the strings (user_A …)

print(model.predict(1, list(range(3))))

Results [-0.15689075 0.10851561 -0.19980735]

hope i makes it clear now , ask if not 😵 good luck 🍀

1reaction

clementechiucommented, Oct 22, 2019

Ok thank you very much @Med-ELOMARI ! I think my problem is probably that I was inputting my continous variables wrong (not as a dictionary with weights as suggested in an answer of the issue #433). I tried splitting it into chunks as you suggested and it works. Thank you very much for your help!