Do not wrap existing DistributedSampler in DistributedProxySampler in auto_dataloader()
See original GitHub issue🚀 Feature
I’ve looked at the implementation of ignite.distributed.auto_dataloader()
and found that when the data loader has a sampler set, it will always wrap this sampler in a DistributedProxySampler
. This works fine when the data loader sampler is of type Sampler
and does itself not handle distribution of samples across processes. However, in my case I want to distribute samples differently across processes to minimize cache misses with my custom map-style dataset which buffers chunks of data. This places some constraints on how indices should be grouped based on rank, i.e., which indices should be assigned to the same process. Currently, the only work-around is to use an iterable dataset instead.
However, I would think that the lines: https://github.com/pytorch/ignite/blob/0f7819a5d2f5c17697eaf4f01848a54ed43e6e53/ignite/distributed/auto.py#L86-L89
could be changed to keep the user provided instance (of a subclass) of DistributedSampler
if set.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Ok, I’ve submitted two separate PRs. One that just raises an error in
DistributedProxySampler
to avoid wrapping aDistributedSampler
and the other that modifiesauto_dataloader()
to allow users to provide their own distributed sampler.@aschuh-hf thanks for FR and giving the context, it helps to understand !
Yes, I agree that we can think about adding this possiblity to skip wraping
sampler
asDistributedSampler
withDistributedProxySampler
.