question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Correct way of creating Item/User features with Dataset class

See original GitHub issue

Hi Maciej and everyone else 😃,

I am using LightFM in my school project with Yelp Academic dataset. I’ve looked at some previous issues, but I think that none of them were specifically describing what I was looking for (If I’m wrong, sorry for duplicate).

So, I want to incorporate item/user features and create them with Dataset class, but I don’t know if I’m doing it right (I have created some and everthing seems working, but I don’t know it is correct), because in Yelp dataset there are various types of feature values, lot of them are just True/False some are in given range or continuous e.g. price range, or opening hours and also categorical.

Currently I am creating or preparing features to be in collection of (item id, [list of feature names]).

Let’s say I want to create features from columns price_range (range 1-5), accept_credit_cards (bool), smoking_allowed (bool), category (str). The prepared collection of tuples for example:

[
  (item1, [1, False, True, bar],
  (item2, [4, True, False, restaurant],
  (item3, [3, True, True, burgers],
  ...
]

My questions:

  1. Is this way correct or not?
  2. Will be position of True/False values from above taken into account or not (treated like values of two or more features)? Becaues I thing when passing all possible feature values to fit method of Dataset they will not.
  3. Should I use second method which is described in docs ((user id, {feature name: feature weight}), but what than with categories?)
  4. This one isn’t related to my Dataset issue, but what are “sane” parameters when tuning performance of model (learning rate, components, epochs…), because one of my colleague told me that he doesn’t use more than 40 components or epochs.

Thanks and have a nice day!

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8

github_iconTop GitHub Comments

9reactions
bonobocommented, Nov 13, 2018

Thank you for quick response!

Just to be sure, the correct “shape” of features passed to build_item_features/build_user_features should be (I think you forgot to put features into list in your response):

[
    (item1, ['price:1', 'accept_credit_cards:False', 'smoking_allowed:True', 'category:bar']),
    (item2, ['price:4', 'accept_credit_cards:True', 'smoking_allowed:False', 'category:restaurant']),
]

Have a nice day!

7reactions
korlov01commented, Aug 13, 2020

Ah, I spent 6 evenings trying to figure out what’s wrong with the format I used 😁 Documentation says: (item id, {feature name: feature weight})

@maciejkula , could you kindly update the documentation? https://making.lyst.com/lightfm/docs/lightfm.data.html

Big thank you @maciejkula for developing this package!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating a feature class in a feature dataset—ArcMap
In the Catalog tree, right-click the feature dataset in which you want to create a new feature class. Point to New > Feature...
Read more >
Create New Features From Existing Features - OpenClassrooms
Feature engineering is the creation of new input or target features from existing features. The objective is to create ones that do a...
Read more >
Fundamental Techniques of Feature Engineering for Machine ...
I think the best way to achieve expertise in feature engineering is practicing different techniques on various datasets and observing their ...
Read more >
DataSet Class (System.Data) - Microsoft Learn
Invoke the GetChanges method to create a second DataSet that features only the changes to the data. Call the Update method of the...
Read more >
dropdown in html
Dropdown will be created from div with class “dropdown”. css in the same folder ... and has different render methods and several initialization...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found