Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Distributed training error on Nuscene Dataset

See original GitHub issue

Hi, thanks for the excellent work!
But the error below occured when I run this command CUDA_VISIBLE_DEVICES=0,1./tools/dist_train.sh configs/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py 2

Traceback (most recent call last):
  File "./tools/train.py", line 170, in <module>
    main()
  File "./tools/train.py", line 166, in main
    meta=meta)
  File "/home/private/Software/mmdetection/mmdet/apis/train.py", line 150, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle dict_keys objects

Is there any solutions to solve this problem?
my mmdet3d version is 0.6.1

Issue Analytics

State:
Created 2 years ago
Comments:12 (3 by maintainers)

Top GitHub Comments

9reactions

pankongpccommented, Mar 24, 2022

Same problem, set num_workers to 0 works, but any other better solution?

Add torch.multiprocessing.set_start_method('fork') in train.py, like this:

if __name__ == '__main__':
    torch.multiprocessing.set_start_method('fork')
    main()

0reactions

wongsinglamcommented, Aug 12, 2022

Same problem! But it solves with this if name == ‘main’: torch.multiprocessing.set_start_method(‘fork’) main()

Top Results From Across the Web

Attribute-Distributed Learning: Models, Limits, and Algorithms

Abstract—This paper introduces a framework for distributed learning (regression) on attribute-distributed data. First, the con-.

nuScenes prediction tutorial

The goal of the nuScenes prediction challenge is to predict the future location of agents in the nuScenes dataset. Agents are indexed by...

Inference and train with existing models and standard datasets

Train predefined models on standard datasets. MMDetection3D implements distributed training and non-distributed training, which uses MMDistributedDataParallel ...

nuScenes: A Multimodal Dataset for ... - CVF Open Access

Using the lidar baseline we examine the importance of pre-training when training a detector on nuScenes. No pretraining means weights are initialized randomly ......

nuScenes: A multimodal dataset for autonomous driving

Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for...