question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement `UnzipperIterDataPipe`

See original GitHub issue

🚀 The feature

Add unzip as requested by both Vision and Text teams.

dp = IterableWrapper([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
dp0, dp1, dp2 = dp.unzip()

list(dp0)  # [1, 4, 7]

We may be able to reuse the logic of _ChildDataPipehttps://github.com/pytorch/pytorch/blob/fa38e93fe98bfcbdb11288ecfbe5af5c264aabb3/torch/utils/data/datapipes/iter/combining.py#L145

Support tuple, list maybe dict?

Motivation, pitch

It’s way better than letting users to use the following script to mimic unzip

dp = IterableWrapper([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
def expand_fn(data):
    index_data = []
    for i, d in enumerate(data):
        index_data.append((i, d))
    return index_data

dp = dp.flatmap(expand_fn)

def classify(data):
    return data[0]

dp0, dp1, dp2 = dp.demux(classify)

...

Alternatives

No response

Additional context

Note: We need to take extra care for these demux, fork, and future unzip. If users do dp0, dp1, _ = dp.unzip(), this pipeline may be unserializable after iteration as the third output DataPipe is never executed.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
eripcommented, Jan 29, 2022

If users do dp0, dp1, _ = dp.unzip()

Does it make sense to add a ignore_idx: Union[int, Tuple[int, ...]] which allows unzip to skip over certain elements within the inner dp? This avoids the problem, so you’d have something like

dp0, dp1 = dp.unzip(ignore_idx=2) # or dp.unzip(ignore_idx=(2,)), dp.unzip(ignore_idx=(-1,)), ... 
0reactions
NivekTcommented, Feb 10, 2022

Closed by #198

Read more comments on GitHub >

github_iconTop Results From Across the Web

UnZipper — TorchData main documentation - PyTorch
UnZipper (source_datapipe: IterDataPipe[Sequence[T]], sequence_length: int, buffer_size: int = 1000, columns_to_skip: ... Use -1 for the unlimited buffer.
Read more >
[RFC] Disable the multiple Iterators per IterDataPipe (Make ...
State of Iterator is attached to each IterDataPipe instance. ... Users should use Forker for each DataPipe want to have single DataPipe ...
Read more >
Introduction To TorchData: The Best Way To Load Data In ...
In this post I will show you how to use it and why it's better ... Firstly the IterDataPipe represents an updated version...
Read more >
Taking Datasets, DataLoaders, and PyTorch's New DataPipes ...
Class Constructors and Functional Forms; IterDataPipes and ... Commonly, we use the Dataset class together with the DataLoader class.
Read more >
how to upload and read a zip file containing training and ...
Use google.colab.files to upload the zip. ... Then just run !unzip : ... I think you can use PySurvival library is compatible with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found