question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Distributed training error on Nuscene Dataset

See original GitHub issue

Hi, thanks for the excellent work!
But the error below occured when I run this command CUDA_VISIBLE_DEVICES=0,1./tools/dist_train.sh configs/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py 2

Traceback (most recent call last):
  File "./tools/train.py", line 170, in <module>
    main()
  File "./tools/train.py", line 166, in main
    meta=meta)
  File "/home/private/Software/mmdetection/mmdet/apis/train.py", line 150, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle dict_keys objects

Is there any solutions to solve this problem?
my mmdet3d version is 0.6.1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

9reactions
pankongpccommented, Mar 24, 2022

Same problem, set num_workers to 0 works, but any other better solution?

Add torch.multiprocessing.set_start_method('fork') in train.py, like this:

if __name__ == '__main__':
    torch.multiprocessing.set_start_method('fork')
    main()
0reactions
wongsinglamcommented, Aug 12, 2022

Same problem! But it solves with this if name == ‘main’: torch.multiprocessing.set_start_method(‘fork’) main()

Read more comments on GitHub >

github_iconTop Results From Across the Web

Attribute-Distributed Learning: Models, Limits, and Algorithms
Abstract—This paper introduces a framework for distributed learning (regression) on attribute-distributed data. First, the con-.
Read more >
nuScenes prediction tutorial
The goal of the nuScenes prediction challenge is to predict the future location of agents in the nuScenes dataset. Agents are indexed by...
Read more >
Inference and train with existing models and standard datasets
Train predefined models on standard datasets. MMDetection3D implements distributed training and non-distributed training, which uses MMDistributedDataParallel ...
Read more >
nuScenes: A Multimodal Dataset for ... - CVF Open Access
Using the lidar baseline we examine the importance of pre-training when training a detector on nuScenes. No pretraining means weights are initialized randomly ......
Read more >
nuScenes: A multimodal dataset for autonomous driving
Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found