Distributed training error on Nuscene Dataset
See original GitHub issueHi, thanks for the excellent work!
But the error below occured when I run this command CUDA_VISIBLE_DEVICES=0,1./tools/dist_train.sh configs/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py 2
Traceback (most recent call last):
File "./tools/train.py", line 170, in <module>
main()
File "./tools/train.py", line 166, in main
meta=meta)
File "/home/private/Software/mmdetection/mmdet/apis/train.py", line 150, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
w.start()
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/private/Software/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle dict_keys objects
Is there any solutions to solve this problem?
my mmdet3d version is 0.6.1
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (3 by maintainers)
Top Results From Across the Web
Attribute-Distributed Learning: Models, Limits, and Algorithms
Abstract—This paper introduces a framework for distributed learning (regression) on attribute-distributed data. First, the con-.
Read more >nuScenes prediction tutorial
The goal of the nuScenes prediction challenge is to predict the future location of agents in the nuScenes dataset. Agents are indexed by...
Read more >Inference and train with existing models and standard datasets
Train predefined models on standard datasets. MMDetection3D implements distributed training and non-distributed training, which uses MMDistributedDataParallel ...
Read more >nuScenes: A Multimodal Dataset for ... - CVF Open Access
Using the lidar baseline we examine the importance of pre-training when training a detector on nuScenes. No pretraining means weights are initialized randomly ......
Read more >nuScenes: A multimodal dataset for autonomous driving
Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Add
torch.multiprocessing.set_start_method('fork')
in train.py, like this:Same problem! But it solves with this if name == ‘main’: torch.multiprocessing.set_start_method(‘fork’) main()