question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Building datasets

See original GitHub issue

Hello !

Thank you for this open source package, it help a lot and your work is amazing.

I just a have a silly question about dataset construction. I followed the example for my data: user (160.000 x 300) and item (4000 x 4).

dataset = Dataset()
dataset.fit(users=(x['id_user'] for x in user),
            items=(x['id_item'] for x in item),
            user_features=((x['id_user'], [[x[col] for col in list_columns_user]]) for x in user),
            item_features=((x['id_item'], [[x[col] for col in list_columns_item]]) for x in item))

But when I try dataset.user_features_shape() I get (160000, 160000). shouldn’t I rather have this (160000, 300) ?

Indeed, we can read in the documentation :

Returns ------- (num user ids, num user features): tuple of ints

and my num user features is 300. So there is an error in what I did?

Sorry for the stupid question!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:16

github_iconTop GitHub Comments

1reaction
maciejkulacommented, Jun 22, 2018

You need to pass an iterable of tuples of (id, [list of features for that id]) into build_features. It looks like at the moment you’re passing the same features for every user?

1reaction
maciejkulacommented, Jun 22, 2018

(Well, you should get 160000 x 1600300 or something like that. Are your feature names the same as some of your user ids?)

Read more comments on GitHub >

github_iconTop Results From Across the Web

There are 177 building datasets available on data.world.
There are 177 building datasets available on data.world. Find open data about building contributed by thousands of users and organizations across the world....
Read more >
How to build your own dataset for Data Science projects
You want to begin with a project, construct a model and run for the results and actively looking for a dataset? Why not...
Read more >
Buildings Datasets
Data tables contain statistics related to construction, building technologies, energy consumption, and building characteristics.
Read more >
24 Free Datasets for Building an Irresistible Portfolio (2022)
Here are the best places to find free data sets for data visualization, data cleaning, machine learning, and data processing projects.
Read more >
Benchmark Datasets for Buildings
Provides infrastructure to identify and summarize previous and current efforts involving data collection for buildings and underlying sub-systems.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found