ShufflerIterDataPipe.set_shuffle_settings(True) gets overridden by DataLoader
See original GitHub issueSorry if this isn’t the right place for this bug report, please let me know.
I think there’s an issue with the shuffling API. Doing
dp = ShufflerIterDataPipe(dp)
dp.set_shuffle_settings(True)
won’t actually shuffle anything unless shuffle=True
is also passed to DataLoader.
This is because of
https://github.com/pytorch/pytorch/blob/3399876306772738936596f1711ec1b7f41495c8/torch/utils/data/dataloader.py#L231-L232
and
https://github.com/pytorch/pytorch/blob/3399876306772738936596f1711ec1b7f41495c8/torch/utils/data/graph_settings.py#L30-L35
and because of the fact that the default shuffle
value of DataLoader is False
.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (8 by maintainers)
Top Results From Across the Web
When does dataloader shuffle happen for Pytorch?
The shuffling happens when the iterator is created. In the case of the for loop, that happens just before the for loop starts....
Read more >torch.utils.data — PyTorch 1.13 documentation
Used when using batched loading from a map-style dataset. pin_memory (bool, optional) – If True , the data loader will copy Tensors into...
Read more >dgl.dataloading.DataLoader — DGL 0.8.2post1 documentation
It supports iterating over a set of nodes, edges or any kinds of indices to get samples in the form of DGLGraph ,...
Read more >dataloader from command line fails to override the property ...
If you look at process.bat file (for me its C:\Program Files (x86)\salesforce.com\Data Loader\bin), you'll see that only two parameters are ...
Read more >Data Loader Process Configuration Parameters
Parameter Name Data Type Equivalent Option in Settings Dialog
dataAccess.readUTF8 boolean Read all CSVs with UTF‑8 encoding
dataAccess.writeUTF8 boolean Write all CSVs with UTF‑8 encoding
dataAccess.name...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The
DataLoader
shouldn’t silently override what was set on the datapipe. Perhaps one way would be to change the default ofDataLoader
’sshuffle
toNone
instead of False. I believe this was the assumption behind usingfrom https://github.com/pytorch/pytorch/blob/3399876306772738936596f1711ec1b7f41495c8/torch/utils/data/graph_settings.py#L30-L35
Closing as it’s fixed by https://github.com/pytorch/pytorch/pull/75505