Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OSError: [Errno 12] Cannot allocate memor

See original GitHub issue

**`(open-mmlab_ldh5) ➜ mmdetection git:(master) ✗ CUDA_VISIBLE_DEVICES=4,5,6,7 ./tools/dist_train.sh ./configs/rpc/faster_rcnn_r50_fpn_1x.py 4 --validate
2019-05-24 20:08:24,708 - INFO - Distributed training: True
2019-05-24 20:08:25,313 - INFO - load model from: modelzoo://resnet50
2019-05-24 20:08:25,611 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer2.2.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer4.1.bn1.num_batches_tracked, la yer2.0.downsample.1.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer4.2.bn1.num_batches_track ed, layer3.5.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer4.0.downsample.1.num_batches _tracked, layer3.4.bn3.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tr acked, layer2.0.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer1.2.bn2.num_batches_track ed, layer2.3.bn3.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer3.1.bn2.num_batches_tracked, bn1.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer3.5 .bn1.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer4.1.bn2.num_batches_tracked, la yer3.2.bn2.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer3.2.bn1.num_batches_tracked

loading annotations into memory… loading annotations into memory… loading annotations into memory… loading annotations into memory… Done (t=202.67s) creating index… index created! Done (t=254.98s) creating index… index created! Done (t=278.15s) creating index… Done (t=279.31s) creating index… index created! index created! loading annotations into memory… loading annotations into memory… loading annotations into memory… loading annotations into memory… Done (t=1.17s) creating index… index created! Done (t=1.26s) creating index… index created! Done (t=1.36s) creating index… index created! Done (t=1.82s) creating index… index created! 2019-05-24 20:13:14,064 - INFO - Start running, host: ices@ices-SYS-4028GR-TR, work_dir: /home/ices/andrewjyz/Projects/detection/2019-5-23-18-56/mmdetection/work_dirs/faster_rcnn_r50_fpn_1x 2019-05-24 20:13:14,065 - INFO - workflow: [(‘train’, 1)], max: 12 epochs Traceback (most recent call last): File “./tools/train.py”, line 95, in <module> main() File “./tools/train.py”, line 91, in main logger=logger) File “/home/ices/andrewjyz/Projects/detection/2019-5-23-18-56/mmdetection/mmdet/apis/train.py”, line 59, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File “/home/ices/andrewjyz/Projects/detection/2019-5-23-18-56/mmdetection/mmdet/apis/train.py”, line 171, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/mmcv/runner/runner.py”, line 356, in run epoch_runner(data_loaders[i], kwargs)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/mmcv/runner/runner.py”, line 258, in train
for i, data_batch in enumerate(data_loader):
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 193, in iter
return _DataLoaderIter(self)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 469, in init
w.start()
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/process.py”, line 112, in start
self._popen = self._Popen(self) File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/context.py”, line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/context.py”, line 284, in _Popen return Popen(process_obj)
File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/popen_spawn_posix.py”, line 32, in init super().init(process_obj) File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/popen_fork.py”, line 20, in init self._launch(process_obj) File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/popen_spawn_posix.py”, line 59, in _launch cmd, self._fds) File “/home/ices/andrewjyz/miniconda3/envs/open-mmlab_ldh5/lib/python3.7/multiprocessing/util.py”, line 420, in spawnv_passfds False, False, None) OSError: [Errno 12] Cannot allocate memory `

My dataset is COCO format. The Json file has “segmentation” data. The size of train JSON file is 7.0GB. The numble of picture is 100000(img_size 1851*1851 ). When I train model , it can not load the dataset and the above error will appear.

My server has 252GB of memory . GPU is GeForce GTX 1080Ti and memory-Usage is 11178MiB. I would like to ask whether all data is imported into memory at one time during the training? If the data is too big, how to train.

I hope someone can help me solve the problem, thanks.

Issue Analytics

State:
Created 4 years ago
Comments:7 (1 by maintainers)

Top GitHub Comments

4reactions

mzk665commented, Sep 24, 2020

I found the same problem and solved it by expanding the swap partition size.

Step-by-step solution:

Increase the swap size, such as 2G. dd if=/dev/zero of=/var/swap bs=1024 count=2048000
Setup the swap file. mkswap /var/swap
Activate the swap partition. swapon /var/swap

Good luck!

0reactions

AkihiroSasabecommented, Oct 30, 2020

@mzk665 Thanks for providing the good solution.

I cannot activate the swap partition. The error message is as follows:

root@mmdetection20200628:/mmdetection# swapon /var/swap
swapon: /var/swap: swapon failed: Operation not permitted

Perhaps the docker setting is the reason I can’t activate the file. Are you working in a docker environment? Do you know how to solve the problem ?

Top Results From Across the Web

[Solved] Oserror: [Errno 12] Cannot Allocate Memory

Oserror: [errno 12] cannot allocate memory is raised by the system when CPU won't get enough memory resources to process pipelined ...

linux - Python subprocess.Popen "OSError: [Errno 12] Cannot ...

ERRORS EAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.

OSError: [Errno 12] Cannot allocate memory #796 - GitHub

OSError:[Errono 12] Cannot allocate memory indicates your computer is out of RAM when using train.py --cache . Your options are to remove the...

OSError: [Errno 12] Cannot allocate memory - Kaggle

I am getting this error even when I do ls in a kernel, OSError: [Errno 12] Cannot allocate memory, is there a way...

pytorch 遇到OSError: [Errno 12] Cannot allocate memory错误 ...

在跑一段之前正常的代码，突然间就报[Errno 12] Cannot allocate memory ，通过排查内存方面的问题，最终解决.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

OSError: [Errno 12] Cannot allocate memor

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

How to finetune from pretrained models trained on coco data with different number of classes?

AttributeError: 'RetinaNet' object has no attribute 'CLASSES'