OSError: [Errno 12] Cannot allocate memor
See original GitHub issue**`(open-mmlab_ldh5) ➜ mmdetection git:(master) ✗ CUDA_VISIBLE_DEVICES=4,5,6,7 ./tools/dist_train.sh ./configs/rpc/faster_rcnn_r50_fpn_1x.py 4 --validate
2019-05-24 20:08:24,708 - INFO - Distributed training: True
2019-05-24 20:08:25,313 - INFO - load model from: modelzoo://resnet50
2019-05-24 20:08:25,611 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias
missing keys in source state_dict: layer2.2.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer4.1.bn1.num_batches_tracked, la yer2.0.downsample.1.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer4.2.bn1.num_batches_track ed, layer3.5.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer4.0.downsample.1.num_batches _tracked, layer3.4.bn3.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tr acked, layer2.0.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer1.2.bn2.num_batches_track ed, layer2.3.bn3.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer3.1.bn2.num_batches_tracked, bn1.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer3.5 .bn1.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer4.1.bn2.num_batches_tracked, la yer3.2.bn2.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer3.2.bn1.num_batches_tracked
loading annotations into memory…
loading annotations into memory…
loading annotations into memory…
loading annotations into memory…
Done (t=202.67s)
creating index…
index created!
Done (t=254.98s)
creating index…
index created!
Done (t=278.15s)
creating index…
Done (t=279.31s)
creating index…
index created!
index created!
loading annotations into memory…
loading annotations into memory…
loading annotations into memory…
loading annotations into memory…
Done (t=1.17s)
creating index…
index created!
Done (t=1.26s)
creating index…
index created!
Done (t=1.36s)
creating index…
index created!
Done (t=1.82s)
creating index…
index created!
2019-05-24 20:13:14,064 - INFO - Start running, host: ices@ices-SYS-4028GR-TR, work_dir: /home/ices/andrewjyz/Projects/detection/2019-5-23-18-56/mmdetection/work_dirs/faster_rcnn_r50_fpn_1x
2019-05-24 20:13:14,065 - INFO - workflow: [(‘train’, 1)], max: 12 epochs
Traceback (most recent call last):
File “./tools/train.py”, line 95, in <module>
main()
File “./tools/train.py”, line 91, in main
logger=logger)
File “/home/ices/andrewjyz/Projects/detection/2019-5-23-18-56/mmdetection/mmdet/apis/train.py”, line 59, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File “/home/ices/andrewjyz/Projects/detection/2019-5-23-18-56/mmdetection/mmdet/apis/train.py”, line 171, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/mmcv/runner/runner.py”, line 356, in run
epoch_runner(data_loaders[i], kwargs)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/mmcv/runner/runner.py”, line 258, in train
for i, data_batch in enumerate(data_loader):
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 193, in iter
return _DataLoaderIter(self)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 469, in init
w.start()
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/process.py”, line 112, in start
self._popen = self._Popen(self)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/context.py”, line 284, in _Popen
return Popen(process_obj)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/popen_spawn_posix.py”, line 32, in init
super().init(process_obj)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/popen_fork.py”, line 20, in init
self._launch(process_obj)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/popen_spawn_posix.py”, line 59, in _launch
cmd, self._fds)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/util.py”, line 420, in spawnv_passfds
False, False, None)
OSError: [Errno 12] Cannot allocate memory
`
My dataset is COCO format. The Json file has “segmentation” data. The size of train JSON file is 7.0GB. The numble of picture is 100000(img_size 1851*1851 ). When I train model , it can not load the dataset and the above error will appear.
My server has 252GB of memory . GPU is GeForce GTX 1080Ti and memory-Usage is 11178MiB. I would like to ask whether all data is imported into memory at one time during the training? If the data is too big, how to train.
I hope someone can help me solve the problem, thanks.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (1 by maintainers)
Top GitHub Comments
I found the same problem and solved it by expanding the swap partition size.
Step-by-step solution:
Increase the swap size, such as 2G. dd if=/dev/zero of=/var/swap bs=1024 count=2048000
Setup the swap file. mkswap /var/swap
Activate the swap partition. swapon /var/swap
Good luck!
@mzk665 Thanks for providing the good solution.
I cannot activate the swap partition. The error message is as follows:
Perhaps the docker setting is the reason I can’t activate the file. Are you working in a docker environment? Do you know how to solve the problem ?