question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DistributedProxySampler RuntimeError when indices are padded

See original GitHub issue

🐛 Bug description

The RuntimeError that occurs in the DistributedProxySampler on line 241 shouldn’t be there since the indices are padded with the full sample which was updated because of this comment.

Environment

  • PyTorch Version (e.g., 1.4):
  • Ignite Version (e.g., 0.3.0):
  • OS (e.g., Linux):
  • How you installed Ignite (conda, pip, source):
  • Python version:
  • Any other relevant information:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ryanwongsacommented, Jul 9, 2020

@vfdev-5 I created PR #1192 with the changes you described above. The test has also been updated to reflect the newer test you described earlier. I am not sure if the PR how you wanted it so would be good to get feedback. Thanks

1reaction
ryanwongsacommented, Jul 9, 2020

Taking the example from the unit test and setting the num_replicas to 8 produces the error

from ignite.distributed.auto import DistributedProxySampler
import torch
from torch.utils.data import WeightedRandomSampler

weights = torch.ones(100)
weights[:50] += 1
num_samples = 100
sampler = WeightedRandomSampler(weights, num_samples)

num_replicas = 8
dist_samplers = [DistributedProxySampler(sampler, num_replicas=num_replicas, rank=i) for i in range(num_replicas)]

torch.manual_seed(0)
true_indices = list(sampler)

indices_per_rank = []
for s in dist_samplers:
    s.set_epoch(0)
    indices_per_rank += list(s)

assert set(indices_per_rank) == set(true_indices)

The error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-d02cd2dd1018> in <module>
     17 for s in dist_samplers:
     18     s.set_epoch(0)
---> 19     indices_per_rank += list(s)

/opt/conda/lib/python3.7/site-packages/ignite/distributed/auto.py in __iter__(self)
    240 
    241         if len(indices) != self.total_size:
--> 242             raise RuntimeError("{} vs {}".format(len(indices), self.total_size))
    243 
    244         # subsample

RuntimeError: 200 vs 104

The assert will fail too after fixing the RuntimeError but that is because of the padding.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DistributedProxySampler — PyTorch-Ignite v0.4.10 ...
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
Read more >
pytorch-ignite Changelog - pyup.io
Fixed distributed proxy sampler runtime error (1192) - Fixes bug using `idist` with "nccl" backend and torch cuda is not available (1166)
Read more >
Expected a 'cuda' device type for generator ... - Stack Overflow
Generator object at 0x7ff7f8143110> (Pdb) indices ... generator=g).tolist() *** RuntimeError: Expected a 'cuda' device type for generator ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found