question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Graph traversal is broken for custom iter datapipes

See original GitHub issue
from torch.utils.data.graph import traverse
from torchdata.datapipes.iter import IterDataPipe, IterableWrapper


class CustomIterDataPipe(IterDataPipe):
    def noop(self, x):
        return x

    def __init__(self):
        self._dp = IterableWrapper([]).map(self.noop)

    def __iter__(self):
        yield from self._dp


traverse(CustomIterDataPipe())
RecursionError: maximum recursion depth exceeded

Without the .map() call it works fine. I don’t think this is specific to .map() though. From trying a few datapipes, this always happens if self._dp is composed in some way.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:24 (21 by maintainers)

github_iconTop GitHub Comments

2reactions
NivekTcommented, Mar 30, 2022

Thanks @NivekT

This should mean that your code snippet above will work if dill is not installed.

This means that the cyclic reference issue is still happening for users that have dill installed, right? Unfortunately I’m not sure we can expect torchvision’s users to not have installed dill.

Do you think there is a way to fix the dill serialization to properly handle the cyclic ref issue?

Yep, the PR is meant to be a quick fix to unblock your work and I marked that as a TODO. I agree with your point and am looking into it.

2reactions
NicolasHugcommented, Mar 30, 2022

@ejguan @NivekT , going back to @pmeier 's https://github.com/pytorch/data/issues/237#issuecomment-1080651807:

  • Does torchdata have to support dill?
  • If not, would we consider merging Philip’s proposed diff for fixing the cyclic reference issue?
  • If yes, would you mind providing some details on why dill is needed? Hopefully we can still find a fix for the cyclic reference issue while still supporting dill?

Fixing the cyclic reference issue would allow us to move forward with our preferred design for torchvision new datasets.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[RFC] Disable the multiple Iterators per IterDataPipe (Make ...
This is the initial draft. I will complete it shortly. State of Iterator is attached to each IterDataPipe instance.
Read more >
Iterable-style DataPipes — TorchData main documentation
Collates samples from DataPipe to Tensor(s) by a custom collate function (functional name: collate ). Grouper. Groups data from input IterDataPipe by keys...
Read more >
A lightweight, flow-based toolkit for parallel and distributed ...
A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map ...
Read more >
[FIXED] PyTorch Datapipes and how does overwriting the ...
It looks like the new_dp.parse_csv(skip_lines=1) is trying do a a new initialization through a MixIn between CSVParserIterDataPipe and ...
Read more >
A PyTorch repo for data loading and utilities to be shared by ...
Graph traversal is broken for custom iter datapipes. from torch.utils.data.graph import traverse from torchdata.datapipes.iter import IterDataPipe, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found