Suggestion: remove the `default` parameter of `ShufflerIterDataPipe`
See original GitHub issueIIUC, the default
parameter of ShufflerIterDataPipe
is intended to allow users to disable shuffling and potentially re-enable it manually later.
This seems to be supported already through the set_shuffle_setting(False)
method, namely:
dp = ShufflerIterDataPipe(dp, default=False)
# is equivalent to
dp = ShufflerIterDataPipe(dp)
dp.set_shuffle_setting(False)
So I would suggest to remove the default
parameter so as to simplify the interface of ShufflerIterDataPipe
, as its use-case is already natively supported in a simple way. Going further, perhaps we could also let set_shuffle_setting
return self
, so that one can just do the following one-liner:
dp = ShufflerIterDataPipe(dp).set_shuffle_setting(False)
Happy to submit a PR for this if you’re OK with the proposal.
CC @pmeier with whom I just discussed this offline.
EDIT: another suggestion from Philip:
Maybe also rename to set_shuffle since the settings part sounds like I’m able to set multiple things
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:5 (4 by maintainers)
Sounds great to me. Eliminating this argument would also help us unify the API with TorchArrow’s shuffle.
Great.
And, for DataLoader, we still need to modify the default argument to None to prevent overriding if users don’t provide such
shuffle
argument.Closed in pytorch/pytorch#74370.