assert len(indices) == self.total_size error during multiple GPU training
See original GitHub issueI am trying to train my dataset on 8 GPU’s. However, after calling ./dist_train.sh
this error assertion appeares:
Traceback (most recent call last):
File “./tools/train.py”, line 113, in <module>
main()
File “./tools/train.py”, line 109, in main
logger=logger)
File “/mmdetection/mmdet/apis/train.py”, line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File “/mmdetection/mmdet/apis/train.py”, line 186, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File “/opt/conda/lib/python3.6/site-packages/mmcv/runner/runner.py”, line 358, in run
epoch_runner(data_loaders[i], **kwargs)
File “/opt/conda/lib/python3.6/site-packages/mmcv/runner/runner.py”, line 260, in train
for i, data_batch in enumerate(data_loader):
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 193, in iter return _DataLoaderIter(self)
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 493, in init
self._put_indices()
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 591, in _put_indices
indices = next(self.sample_iter, None)
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/sampler.py”, line 172, in iter
for idx in self.sampler:
File “/mmdetection/mmdet/datasets/loader/sampler.py”, line 138, in iter
assert len(indices) == self.total_size
…
in the config I tried various values for imgs_per_gpu
and workers_per_gpu
, currently it is:
imgs_per_gpu=2, workers_per_gpu=2,
no settings was working though. Single-GPU training works well.
What is the meaning of this assert? Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (3 by maintainers)
Top GitHub Comments
I meet the same issue, how to fix it?
Than I deleted w>h pics and get another error: TypeError: ‘NoneType’ object is not subscriptable