Torch memory leak
See original GitHub issueIt looks like there is a memory leak with Torch.
I tried to run tests/Test
with different configurations and my findings are:
1) backend=Reference
Appears to run fine. It is slow but runs with constant memory consumption.
2) backend=Torch, device=CPU
Runs faster but starts swapping after a while and gets killed by the OOM killer after exhausting all memory and swap space:
net params: 1199882
Torch
Duration |Iters| Ep| Minib| Loss
0.00:00:04 | 1 | 1 | 1/937 | 2.336721e+000 🡾 New min
0.00:00:08 | 2 | 1 | 2/937 | 2.202248e+000 🡾 New min
0.00:00:11 | 3 | 1 | 3/937 | 1.961257e+000 🡾 New min
0.00:00:22 | 4 | 1 | 4/937 | 1.806348e+000 🡾 New min
0.00:01:41 | 5 | 1 | 5/937 | 1.367965e+000 🡾 New min
<killed>
3) backend=Torch, device=GPU
Also exhausts memory - this time on GPU:
net params: 1199882
Torch
Duration |Iters| Ep| Minib| Loss
0.00:00:02 | 1 | 1 | 1/937 | 2.316844e+000 🡾 New min
Unhandled exception. System.Runtime.InteropServices.ExternalException (0x80004005): CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.93 GiB total capacity; 4.85 GiB already allocated; 6.06 MiB free; 5.04 GiB reserved in total by PyTorch) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:289)
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (2 by maintainers)
Top Results From Across the Web
Memory Leak Debugging and Common Causes
the most useful way I found to debug is to use torch.cuda.memory_allocated() and torch.cuda.max_memory_allocated() to print a percent of used ...
Read more >Memory Leakage with PyTorch
Memory Leakage with PyTorch · DETACH THE LOSS and GET ONLY ITS VALUE · MOVE MODEL, INPUT and OUTPUT to CUDA · TRY...
Read more >How to debug causes of GPU memory leaks?
I am having a memory bug here (Advice on debugging a GPU memory leak in graph?) where, even when the model is in...
Read more >Need Help Debugging Memory Leaks in PyTorch
Reduce batch size: Try reducing the batch size to see if that resolves the memory leak. These are a few strategies to help...
Read more >PyTorch Memory Leak on LossBackward on Both GPU and ...
The PyTorch memory leak on loss.backward() can occur due to a few reasons. One common cause is the accumulation of gradients during ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@gbaydin FWIW for tests\Test on WIndows at the moment I see
Just in case it’s helpful.
I’ll start a separate thread about how we should test memory usage
Great, thanks for confirming!