question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE] Loading a Dataset as a tuple instead of a dictionary

See original GitHub issue

🚨🚨 Feature Request

  • A new implementation (Improvement, Extension)

Is your feature request related to a problem?

Currently, the generators for the Keras library uses batches of (x, y) tuples in order to feed data to the model. The Hub schemas only let us change the shape and/or datatype of the data within a dictionary. If we could extend this behavior to load data as batches of tuples as well the keras integration would be seamless.

If your feature will improve HUB

Hub would be seamlessly integrated with Keras. End-Users would not need to modfiy the data further after loading it via hub, when they are using Keras models and generators.

Description of the possible solution

We might need to restructure the APIs that are providing the data. Though unlikely, could be breaking.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
mynameisvinncommented, Jan 19, 2021

Gotcha, so something more like this?

def load_data(ds, batch_size):
"""Return a generator to yield (X, y).
"""
   for i in range(n, batch_size):
       X = ds["data", i: i + batch_size].compute()
       y = ds["label", i: i + batch_size].compute()
       yield X, y
1reaction
DebadityaPalcommented, Jan 19, 2021

Thanks, @mynameisvinn this works perfectly!

@AbhinavTuli what I was initially doing was something like this, I converted the whole data to (X, y) pairs first.

def convert(...):
     ``` converts the whole data into (X,y)```
     ...
     return (X, y)

(X, y) =  convert(ds)
model.fit(X, y)

what I would like to do is:

```suppose this sets the schema to return tuples```
schema = (np.ndarray(int64), np.ndarray(int64))   

ds = Dataset("activeloop/mnist", mode='w', schema=schema)
 
model.fit(ds, batch_size=64)

I know that schema is supposed to give a structure to a local dataset that one is about to upload, but it would be good if we could use it to restructure retrieved datasets and especially convert them to tuples from dictionaries. However, this problem is completely solved by the code snippet provided by @mynameisvinn, so I don’t think it would be necessary to change the current flow of the software.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to flatten a casted tuple of dictionaries into one dictionary?
I have the following function, which creates a dataset, from two separate dictionaries, by making them a tuple. For example:
Read more >
Source code for datasets.features - Hugging Face
We use the '_type' fields to get the dataclass name to load. """ # Nested structures: we allow dict, list/tuples, sequences if isinstance( ......
Read more >
Write Pythonic and Clean Code With namedtuple - Real Python
A quite common use case for named tuples is to use them to store database records. You can define namedtuple classes using the...
Read more >
Python Tutorial : Data Structures (list, dict, tuples, sets, strings)
In this video I am going to show How to use different Data Structures in Python 3. The builtins data structures are: lists,...
Read more >
Ultimate Guide to Lists, Tuples, Arrays and Dictionaries For ...
The most commonly used data structures are lists and dictionaries. In this article we also talk about tuples and arrays.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found