question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

port datasets from the old to the new API

See original GitHub issue

The new dataset API is now stable enough to start porting more datasets from the old API. For the 0.13.0 release planned for 2022H2 we want to achieve at least feature parity for the new API. If you want to help out, please comment on the respective issue so we can assign it to you.

The process of adding a dataset to the new API is described here. In addition, we already ported some datasets that you could use as reference. In any case, if you are blocked by something feel free to send a partial PR and ping me there so I can help.

The following datasets need to be ported:

Image classification

Image classification datasets are good starting point if you are not familiar with the dataset or the new API since they these datsets tend to be the easiest.

Image detection or segmentation

Image detection or segmentation datasets tend to be a little harder since one needs to merge more infomation into one sample compared to classification. My suggestion is to only pick one of these if you are either familiar with the dataset or the new API so you don’t have two manage two things at once.

Image pairs

We are still designing how exactly image pair datasets should be implemented. I list them here for completeness, but I suggest not picking up any of them until the design is finished.

Video classification

We are still designing how exactly video datasets should be implemented. I list them here for completeness, but I suggest not picking up any of them until the design is finished.

Optical flow

We are still designing how exactly optical flow datasets should be implemented. I list them here for completeness, but I suggest not picking up any of them until the design is finished.

[^1]: These datasets do not provide public download links for the data so they might be harder to work on. [^2]: These datasets are implemented as classification datasets in the old API, but provide extra annotations for detection or segmentation. [^3]: Maybe we should have lfw/people, kitti/object, and kitti/flow datasets to cleanly separate the different variants. This also applies to coco as discussed in https://github.com/pytorch/vision/pull/5326#discussion_r796813705

cc @pmeier @bjuncek

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:4
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

4reactions
pmeiercommented, Mar 28, 2022

Hey everyone. We decided to remove the help “wanted label” for now. Reviewing the PRs takes more time out of my schedule than I anticipated. This is not comment on the quality of the PRs, but rather a late acknowledgement that due to their diverse nature datasets are hard to review. We very much appreciate ever contribution towards closing this issue.

@yassineAlouini @puhuk @abhi-glitchhg @Dbhasin1 @zhiqwang @Amapocho you all have issues assigned to you for which there is no PR yet. If you haven’t started yet, I suggest not starting until we give another signal here. If you already have an implementation, you might also send a PR and I will try to review them in a timely manner. In case you don’t want to work on the dataset anymore, please comment on the issue so I can un-assign you.

@lezwon @yassineAlouini @vballoli Our decision has no effect on your already open PRs. I’ll review them normally.

4reactions
pmeiercommented, Feb 15, 2022

Hey @Dbhasin1 @abhi-glitchhg @vballoli @Amapocho @vfdev-5. We recently merged #5407, which included some changes to the prototype datasets. I believe the only touching point for you is the removal of the decoder. This means, _make_datapipe no longer gets passed a decoder. Instead you can from torchvision.prototype.features import EncodedImage and just use image = EncodedImage.from_file(buffer) rather than the old idiom image = decoder(buffer) if decoder else buffer.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Datasets] Port `.to_torch()` to new `IterDataPipe` API. - GitHub
torchdata has a new IterDataPipe API that will subsume the old IterableDataset API, which is now considered the legacy Torch data interface.
Read more >
port - Dataset - Catalog
This dataset describes the total volume of Import and Export Loaded Containers moved through maritime terminals located within Port Authority property in the ......
Read more >
Where can I download a free dataset containing major ports ...
I would like to find a free dataset to use, preferably in shapefile or some other Arc friendly format. The only information desired...
Read more >
Principal Port - Datasets - AmeriGEOSS Community Platform ...
Data and Resources ; ArcGIS Hub DatasetHTML ; Esri Rest APIEsri REST ; GeoJSONGeoJSON ; CSVCSV ; KMLKML.
Read more >
Introduction – Global Fishing Watch API Documentation
Events API: encounters, loitering, port visits and fishing events based ... Old API datasets will remain available for 3 months until they are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found