Legacy interface for prototype datasets?
See original GitHub issueThe prototype datasets change the interface in two ways:
- The input parameters are a little different in some cases. For example in the current API the MNIST dataset would be instantiated with
datasets.MNIST(..., train=True)
whereas now it looks likedatasets.load("mnist", split="train")
. - The output is completely different. Before we returned a tuple (sometimes of varying length) whereas now we always return a dictionary. Furthermore, before we used
PIL
images andnumpy
arrays as return types, whereas now we always use tensor subclasses.
To lower the burden to move to the new style datasets a little, we could have a legacy: bool = False
keyword argument on datasets.load()
. For all datasets that have a legacy variant, we could simply implement two functions that map the input and output. If a dataset doesn’t support a legacy variant, we could simply error out.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
[Datasets] Port `.to_torch()` to new `IterDataPipe` API. - GitHub
torchdata has a new IterDataPipe API that will subsume the old IterableDataset API, which is now considered the legacy Torch data interface.
Read more >Creating synthetic patient data to support the design ... - NCBI
We illustrate our approach by describing its use for a set of interface prototypes created in the design of a novel system to...
Read more >Prototype Datasets - NEON Data Portal
Open-source and open-development software for reproducible, extensible and portable data analysis includes the eddy4R family of R-packages underlying the EC ...
Read more >New Comtrade FAQ for Advanced Users - UN Statistics Division
What is the legacy of the UN Comtrade? ... Why are some converted datasets not accessible in the UI of the new Comtrade?...
Read more >Development — CASA Next Generation Infrastructure 0.1b ...
The CNGI Prototype application programming interface (API) is a set of flat, stateless functions that take an xarray Dataset as an input parameter...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yeah I think this is a critical detail that needs to be highlighted here. If what we get from the dataset is different, the user still needs to make a bunch of changes on their code to handle this. It’s not just the X part of the training data pair; it might be necessary handling the Y as well.
Given the above, I’m not sure if writing extra code to make things look different is worth it, given that the APIs need to be handled differently (due to their return types). Instead, one could argue that we should be putting enough good features in the new API, solving common user issues to make the migration worth it.
I would love to hear what others think on this. @NicolasHug @prabhat00155 @fmassa?
Overall I share @datumbox points.
I don’t think it’s worth the hassle and extra complexity of maintaining 2 sets of API. If anything, I feel like this would actually hurt the migration, because some users would just do half of it instead of doing it all. and if/when we remove support for the legacy, they would have to apply a second set of change.
I was going to comment something along these lines before reading it