Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement `UnzipperIterDataPipe`

See original GitHub issue

🚀 The feature

Add unzip as requested by both Vision and Text teams.

dp = IterableWrapper([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
dp0, dp1, dp2 = dp.unzip()

list(dp0)  # [1, 4, 7]

We may be able to reuse the logic of _ChildDataPipehttps://github.com/pytorch/pytorch/blob/fa38e93fe98bfcbdb11288ecfbe5af5c264aabb3/torch/utils/data/datapipes/iter/combining.py#L145

Support tuple, list maybe dict?

Motivation, pitch

It’s way better than letting users to use the following script to mimic unzip

dp = IterableWrapper([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
def expand_fn(data):
    index_data = []
    for i, d in enumerate(data):
        index_data.append((i, d))
    return index_data

dp = dp.flatmap(expand_fn)

def classify(data):
    return data[0]

dp0, dp1, dp2 = dp.demux(classify)

...

Alternatives

No response

Additional context

Note: We need to take extra care for these demux, fork, and future unzip. If users do dp0, dp1, _ = dp.unzip(), this pipeline may be unserializable after iteration as the third output DataPipe is never executed.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

eripcommented, Jan 29, 2022

If users do dp0, dp1, _ = dp.unzip()

Does it make sense to add a ignore_idx: Union[int, Tuple[int, ...]] which allows unzip to skip over certain elements within the inner dp? This avoids the problem, so you’d have something like

dp0, dp1 = dp.unzip(ignore_idx=2) # or dp.unzip(ignore_idx=(2,)), dp.unzip(ignore_idx=(-1,)), ...

0reactions

NivekTcommented, Feb 10, 2022

Closed by #198

Top Results From Across the Web

UnZipper — TorchData main documentation - PyTorch

UnZipper (source_datapipe: IterDataPipe[Sequence[T]], sequence_length: int, buffer_size: int = 1000, columns_to_skip: ... Use -1 for the unlimited buffer.

[RFC] Disable the multiple Iterators per IterDataPipe (Make ...

State of Iterator is attached to each IterDataPipe instance. ... Users should use Forker for each DataPipe want to have single DataPipe ...

Introduction To TorchData: The Best Way To Load Data In ...

In this post I will show you how to use it and why it's better ... Firstly the IterDataPipe represents an updated version...

Taking Datasets, DataLoaders, and PyTorch's New DataPipes ...

Class Constructors and Functional Forms; IterDataPipes and ... Commonly, we use the Dataset class together with the DataLoader class.

how to upload and read a zip file containing training and ...

Use google.colab.files to upload the zip. ... Then just run !unzip : ... I think you can use PySurvival library is compatible with...