What is the right way (format) to input user features to the build_user_features function?
See original GitHub issueI have been struggling for a while now trying to input my user features into lightfm to try to include them in my recommendation model. I have read many other issues refering to similar problems, but I still can’t manage to work solve my problem.
I have my data in a pandas dataframe. My user IDs are strings like "AHS-1"
, and I’m trying to include one user feature to start. For what I’ve read, the format in input user features into the build_user_features function is:
[user_id1, [user_feature1, user_feature2], [user_id2, [user_feature1, user_feature2] …]
I’ve tried many options to create this, the latest being creating a dataframe with the user IDs and the feature, then converting this into a tuple, but I get the error “TypeError: ‘int’ object is not iterable” since my feature is an int.
Here’s my code:
user_features_pd=pd.concat([user_data['mber_id'],user_data.iloc[:,5].astype(int)], axis=1)
tuples = [tuple(x) for x in user_features_pd.values]
user_features = dataset.build_user_features((tuples))
I would appreciate very much any help!
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top GitHub Comments
well , the error explain itself (Feature 62 not in eature mapping. Call fit first) , i think you used just the function that i mentioned . i recommend you to use the whole class , that will fit every unique existing feature before building it .
let’s pic a super 😄 small example , here we have this interactions :
with this users details :
and items details :
so the right way or format to present the data is :
first we create a Dataset instance
then we must fit all the data you have , users , items , all features this step is essential because the LightFM model understand numbers not strings , so we will need to map each string we have to a number , and that’s what dataset.fit do
you can see the feature_mapping is a dict of mappings for our data
then we can build the rest (assuming everything we will add here is already fitted (Mapped) in the last step) :
then we can feed our model
to predict , we give the mappings as input , not the strings (user_A …)
Results [-0.15689075 0.10851561 -0.19980735]
hope i makes it clear now , ask if not 😵 good luck 🍀
Ok thank you very much @Med-ELOMARI ! I think my problem is probably that I was inputting my continous variables wrong (not as a dictionary with weights as suggested in an answer of the issue #433). I tried splitting it into chunks as you suggested and it works. Thank you very much for your help!