Implement `UnzipperIterDataPipe`
See original GitHub issue🚀 The feature
Add unzip
as requested by both Vision and Text teams.
dp = IterableWrapper([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
dp0, dp1, dp2 = dp.unzip()
list(dp0) # [1, 4, 7]
We may be able to reuse the logic of _ChildDataPipe
https://github.com/pytorch/pytorch/blob/fa38e93fe98bfcbdb11288ecfbe5af5c264aabb3/torch/utils/data/datapipes/iter/combining.py#L145
Support tuple
, list
maybe dict
?
Motivation, pitch
It’s way better than letting users to use the following script to mimic unzip
dp = IterableWrapper([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
def expand_fn(data):
index_data = []
for i, d in enumerate(data):
index_data.append((i, d))
return index_data
dp = dp.flatmap(expand_fn)
def classify(data):
return data[0]
dp0, dp1, dp2 = dp.demux(classify)
...
Alternatives
No response
Additional context
Note: We need to take extra care for these demux
, fork
, and future unzip
.
If users do dp0, dp1, _ = dp.unzip()
, this pipeline may be unserializable after iteration as the third output DataPipe is never executed.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (5 by maintainers)
Top Results From Across the Web
UnZipper — TorchData main documentation - PyTorch
UnZipper (source_datapipe: IterDataPipe[Sequence[T]], sequence_length: int, buffer_size: int = 1000, columns_to_skip: ... Use -1 for the unlimited buffer.
Read more >[RFC] Disable the multiple Iterators per IterDataPipe (Make ...
State of Iterator is attached to each IterDataPipe instance. ... Users should use Forker for each DataPipe want to have single DataPipe ...
Read more >Introduction To TorchData: The Best Way To Load Data In ...
In this post I will show you how to use it and why it's better ... Firstly the IterDataPipe represents an updated version...
Read more >Taking Datasets, DataLoaders, and PyTorch's New DataPipes ...
Class Constructors and Functional Forms; IterDataPipes and ... Commonly, we use the Dataset class together with the DataLoader class.
Read more >how to upload and read a zip file containing training and ...
Use google.colab.files to upload the zip. ... Then just run !unzip : ... I think you can use PySurvival library is compatible with...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Does it make sense to add a
ignore_idx: Union[int, Tuple[int, ...]]
which allowsunzip
to skip over certain elements within the inner dp? This avoids the problem, so you’d have something likeClosed by #198