[RFC] Disable the multiple Iterators per IterDataPipe (Make Iterator singleton)
See original GitHub issueThis is the initial draft. I will complete it shortly.
State of Iterator is attached to each IterDataPipe instance. This is super useful for:
- Determinism
- Snapshotting
- Benchmarking -> It becomes easier to register each DataPipe since they have different ID in the graph.
Implementation Options:
- Each DataPipe has an attribute of
_iterator
as the place holder for__iter__
calls. - Implement
__next__
. (My Preference)- It would make the instance pickable. Previously generator function (
__iter__
) is not picklable -> Help multiprocessing and snapshotting) __iter__
returnself
(Forker(self)
may be another option, not 100% sure)- IMO, this is super useful as we can track the number of
__next__
call to do a fast forward. The state of iteration is attached to DataPipe instance, rather than a temporary instance created from__iter__
, which we couldn’t track the internal state. (We can easily track states like RNG, iteration number, buffer, etc. as they are going to be attached toself
instance) - As source DataPipe is attached to each DataPipe, but the actual iteration happens on Iterator level. The graph constructed by DataLoaderV2 doesn’t match the actual execution graph.
- It would make the instance pickable. Previously generator function (
DataLoader trigger Error if there are two DataPipe instance with same id in the graph. (Another option is DataLoader do an automatically fork) Users should use Forker for each DataPipe want to have single DataPipe twice in the graph.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:22 (22 by maintainers)
Top Results From Across the Web
Iterable-style DataPipes — TorchData main documentation
Only one iterator can be valid for each IterDataPipe at a time, and the creation a second iterator will invalidate the first one....
Read more >dual iterator in one python object - Stack Overflow
In python, I am trying to write a class that support two different kind of iterator. Roughly speaking, this object contains a matrix...
Read more >A PyTorch repo for data loading and utilities to be shared by ...
[RFC] Disable the multiple Iterators per IterDataPipe (Make Iterator singleton). This is the initial draft. I will complete it shortly. State of Iterator...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Note to future readers, these linked PRs contain BC-breaking notes that describe the behavior before and after in details:
For the PR, could you verify all our existing customers’ code would behave normally?
Just ignore my argument above. I misunderstood the approach in the PR. We can leave the API as it is. But, we need to document the behavior of the singleton iterator.