Out of memory during training
See original GitHub issueI am running out of memory on every epoch: I have merged A4 and TED datasets and trying to train on the merged dataset and I am getting out of memory every epoch:
Epoch: [4][13/4336] Time 0.538 (0.680) Data 0.003 (0.003) Loss 58.5850 (69.9133)
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 304, in <module>
loss.backward()
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/generic/THCStorage.cu:58
Is any way to set max process gpu memory in pytorch similar to TF:
sess_config = tf.ConfigProto()
sess_config.gpu_options.per_process_gpu_memory_fraction = 0.90
Fortunately I am able to resume using checkpoints. Seems relevant to issue #172
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (4 by maintainers)
Top Results From Across the Web
CUDA out of memory during training - PyTorch Forums
Hello, I am pretty new to machine learning and I am facing an issue I cannot solve by myself. I took this code...
Read more >Cuda out of memory during evaluation but training is fine
Hi, I am finetuning a BARTForConditionalGeneration model. I am using Trainer from the library to train so I do not use anything fancy....
Read more >Resolving CUDA Being Out of Memory With Gradient ...
Implementing gradient accumulation and automatic mixed precision to solve CUDA out of memory issue when training big deep learning models ...
Read more >Runtime error: CUDA out of memory by the end of training and ...
The problem is your loss_train list, which stores all losses from the beginning of your experiment. If the losses you put in were...
Read more >Out of memory during training - Jetson Nano
I am following the “Hello AI world” of Nvidia on my new Jetson-Nano dev kit (4GB). in the 3rd video ( here) ,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m having the same issue - has nothing to do with the batch size, GPU memory keeps increasing regardless. I am using
CUDA8
andpytorch 0.4.0
withPython 3.5
. Has anyone figured a solution to this?Same issue here on Pytorch 1.0.0 and latest warp-ctc with latest pytorch audio. Cuda goes OOM irrespective or layer dimensions or batch size.