question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add a flatmap datapipe

See original GitHub issue

šŸš€ The feature

Add a FlatMapDataPipe which simulataneously flattens nested IterDataPipes and applies a function to nested pipes.

Motivation, pitch

I have tarballs containing tarballs. When dealing with these types of recursive data, it’s useful to have a mechanism to both flatten the structure and apply a function to it so I only need to consider the ā€œinner-mostā€ data.

Alternatives

Domain libraries can define their own impls.

Additional context

Migrating torchtext to datapipes via torchdata would benefit from this feature when dealing with IWSLT data.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ejguancommented, Jan 24, 2022

Technical speaking, this flatmap can also be applied in your use case. The function becomes a generator yielding data…

0reactions
eripcommented, Jan 24, 2022

Yes, the pseudocode I have in mind is basically

# [ 1.tgz, 2.tgz, ..., n.tgz]
inner_tars = outer_tar.read_from_tar()

# [ 1_1.txt, 1_2.txt, 2_1.txt, 2_2.txt, 3_1.txt, ...]. Need a better way than a lambda
inner_inner_files = inner_tars.flatmap(lambda inner_tar: FileOpener(inner_tar).read_from_tar())

So the flatmap’d read_from_tar becomes the generator yielding data and flatmap flattens it into a IterableDataPipe. I think the test I’ve written doesn’t actually reflect this usecase 😬 but that’s a thought for the PR

Read more comments on GitHub >

github_iconTop Results From Across the Web

FlatMapper — TorchData main documentation - PyTorch
Applies a function over each item from the source DataPipe, then flattens the outputs to a single, unnested IterDataPipe (functional name: flatmap )....
Read more >
How to create DataPipe that best optimize the map-transform ...
I've tried to process the X efficiently by kind of using .flatmap() but I don't even know whether I'm really using the autobot_vectorizeĀ ......
Read more >
Chainer/Concater from single datapipe? Ā· Issue #648 - GitHub
The Concater datapipe takes multiple DPs as input. Is there a class that would take a single datapipe of iterables instead?
Read more >
Java 8 flatMap example - Mkyong.com
This article explains the Java 8 Stream flatMap() and how to use it. ... we can loop the 2d array and put all...
Read more >
Data Pipe | Vis Data
The following example shows how to use a Data Pipe to divide app specific and Vis ... .flatMap 's argument is a function...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found