question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Legacy interface for prototype datasets?

See original GitHub issue

The prototype datasets change the interface in two ways:

  1. The input parameters are a little different in some cases. For example in the current API the MNIST dataset would be instantiated with datasets.MNIST(..., train=True) whereas now it looks like datasets.load("mnist", split="train").
  2. The output is completely different. Before we returned a tuple (sometimes of varying length) whereas now we always return a dictionary. Furthermore, before we used PIL images and numpy arrays as return types, whereas now we always use tensor subclasses.

To lower the burden to move to the new style datasets a little, we could have a legacy: bool = False keyword argument on datasets.load(). For all datasets that have a legacy variant, we could simply implement two functions that map the input and output. If a dataset doesn’t support a legacy variant, we could simply error out.

cc @pmeier @bjuncek

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
datumboxcommented, Dec 7, 2021

Yeah I think this is a critical detail that needs to be highlighted here. If what we get from the dataset is different, the user still needs to make a bunch of changes on their code to handle this. It’s not just the X part of the training data pair; it might be necessary handling the Y as well.

Given the above, I’m not sure if writing extra code to make things look different is worth it, given that the APIs need to be handled differently (due to their return types). Instead, one could argue that we should be putting enough good features in the new API, solving common user issues to make the migration worth it.

I would love to hear what others think on this. @NicolasHug @prabhat00155 @fmassa?

0reactions
NicolasHugcommented, Dec 7, 2021

Overall I share @datumbox points.

I don’t think it’s worth the hassle and extra complexity of maintaining 2 sets of API. If anything, I feel like this would actually hurt the migration, because some users would just do half of it instead of doing it all. and if/when we remove support for the legacy, they would have to apply a second set of change.

Instead, one could argue that we should be putting enough good features in the new API, solving common user issues to make the migration worth it

I was going to comment something along these lines before reading it

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Datasets] Port `.to_torch()` to new `IterDataPipe` API. - GitHub
torchdata has a new IterDataPipe API that will subsume the old IterableDataset API, which is now considered the legacy Torch data interface.
Read more >
Creating synthetic patient data to support the design ... - NCBI
We illustrate our approach by describing its use for a set of interface prototypes created in the design of a novel system to...
Read more >
Prototype Datasets - NEON Data Portal
Open-source and open-development software for reproducible, extensible and portable data analysis includes the eddy4R family of R-packages underlying the EC ...
Read more >
New Comtrade FAQ for Advanced Users - UN Statistics Division
What is the legacy of the UN Comtrade? ... Why are some converted datasets not accessible in the UI of the new Comtrade?...
Read more >
Development — CASA Next Generation Infrastructure 0.1b ...
The CNGI Prototype application programming interface (API) is a set of flat, stateless functions that take an xarray Dataset as an input parameter...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found