question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ShufflerIterDataPipe.set_shuffle_settings(True) gets overridden by DataLoader

See original GitHub issue

Sorry if this isn’t the right place for this bug report, please let me know.

I think there’s an issue with the shuffling API. Doing

dp = ShufflerIterDataPipe(dp)
dp.set_shuffle_settings(True)

won’t actually shuffle anything unless shuffle=True is also passed to DataLoader.

This is because of

https://github.com/pytorch/pytorch/blob/3399876306772738936596f1711ec1b7f41495c8/torch/utils/data/dataloader.py#L231-L232 and https://github.com/pytorch/pytorch/blob/3399876306772738936596f1711ec1b7f41495c8/torch/utils/data/graph_settings.py#L30-L35 and because of the fact that the default shuffle value of DataLoader is False.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
NicolasHugcommented, Mar 15, 2022

The DataLoader shouldn’t silently override what was set on the datapipe. Perhaps one way would be to change the default of DataLoader’s shuffle to None instead of False. I believe this was the assumption behind using

if shuffle is not None:
    # override here

from https://github.com/pytorch/pytorch/blob/3399876306772738936596f1711ec1b7f41495c8/torch/utils/data/graph_settings.py#L30-L35

0reactions
ejguancommented, Apr 20, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

When does dataloader shuffle happen for Pytorch?
The shuffling happens when the iterator is created. In the case of the for loop, that happens just before the for loop starts....
Read more >
torch.utils.data — PyTorch 1.13 documentation
Used when using batched loading from a map-style dataset. pin_memory (bool, optional) – If True , the data loader will copy Tensors into...
Read more >
dgl.dataloading.DataLoader — DGL 0.8.2post1 documentation
It supports iterating over a set of nodes, edges or any kinds of indices to get samples in the form of DGLGraph ,...
Read more >
dataloader from command line fails to override the property ...
If you look at process.bat file (for me its C:\Program Files (x86)\salesforce.com\Data Loader\bin), you'll see that only two parameters are ...
Read more >
Data Loader Process Configuration Parameters
Parameter Name Data Type Equivalent Option in Settings Dialog dataAccess.readUTF8 boolean Read all CSVs with UTF‑8 encoding dataAccess.writeUTF8 boolean Write all CSVs with UTF‑8 encoding dataAccess.name...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found