Correct way of creating Item/User features with Dataset class
See original GitHub issueHi Maciej and everyone else 😃,
I am using LightFM in my school project with Yelp Academic dataset. I’ve looked at some previous issues, but I think that none of them were specifically describing what I was looking for (If I’m wrong, sorry for duplicate).
So, I want to incorporate item/user features and create them with Dataset class
, but I don’t know if I’m doing it right (I have created some and everthing seems working, but I don’t know it is correct), because in Yelp dataset there are various types of feature values, lot of them are just True/False
some are in given range or continuous e.g. price range, or opening hours
and also categorical.
Currently I am creating or preparing features to be in collection of (item id, [list of feature names])
.
Let’s say I want to create features from columns price_range (range 1-5), accept_credit_cards (bool), smoking_allowed (bool), category (str)
. The prepared collection of tuples for example:
[
(item1, [1, False, True, bar],
(item2, [4, True, False, restaurant],
(item3, [3, True, True, burgers],
...
]
My questions:
- Is this way correct or not?
- Will be position of
True/False
values from above taken into account or not (treated like values of two or more features)? Becaues I thing when passing all possible feature values tofit
method ofDataset
they will not. - Should I use second method which is described in docs (
(user id, {feature name: feature weight})
, but what than with categories?) - This one isn’t related to my Dataset issue, but what are “sane” parameters when tuning performance of model (learning rate, components, epochs…), because one of my colleague told me that he doesn’t use more than 40 components or epochs.
Thanks and have a nice day!
Issue Analytics
- State:
- Created 5 years ago
- Comments:8
Top GitHub Comments
Thank you for quick response!
Just to be sure, the correct “shape” of features passed to
build_item_features/build_user_features
should be (I think you forgot to put features into list in your response):Have a nice day!
Ah, I spent 6 evenings trying to figure out what’s wrong with the format I used 😁 Documentation says: (item id, {feature name: feature weight})
@maciejkula , could you kindly update the documentation? https://making.lyst.com/lightfm/docs/lightfm.data.html
Big thank you @maciejkula for developing this package!