question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Torch memory leak

See original GitHub issue

It looks like there is a memory leak with Torch.

I tried to run tests/Test with different configurations and my findings are:

1) backend=Reference

Appears to run fine. It is slow but runs with constant memory consumption.

2) backend=Torch, device=CPU

Runs faster but starts swapping after a while and gets killed by the OOM killer after exhausting all memory and swap space:

net params: 1199882
Torch
Duration   |Iters| Ep|  Minib| Loss
0.00:00:04 |   1 | 1 | 1/937 | 2.336721e+000 🡾 New min
0.00:00:08 |   2 | 1 | 2/937 | 2.202248e+000 🡾 New min
0.00:00:11 |   3 | 1 | 3/937 | 1.961257e+000 🡾 New min
0.00:00:22 |   4 | 1 | 4/937 | 1.806348e+000 🡾 New min
0.00:01:41 |   5 | 1 | 5/937 | 1.367965e+000 🡾 New min
<killed>

3) backend=Torch, device=GPU

Also exhausts memory - this time on GPU:

net params: 1199882
Torch
Duration   |Iters| Ep|  Minib| Loss
0.00:00:02 |   1 | 1 | 1/937 | 2.316844e+000 🡾 New min
Unhandled exception. System.Runtime.InteropServices.ExternalException (0x80004005): CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.93 GiB total capacity; 4.85 GiB already allocated; 6.06 MiB free; 5.04 GiB reserved in total by PyTorch) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:289)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
dsymecommented, Jun 4, 2020

@gbaydin FWIW for tests\Test on WIndows at the moment I see

  • A quick rise to ~17GB after about 5 iterations, then 25GB after 8
  • About 10 iterations completed then I hit Ctrl-C

Just in case it’s helpful.

I’ll start a separate thread about how we should test memory usage

0reactions
dsymecommented, Jun 10, 2020

Great, thanks for confirming!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory Leak Debugging and Common Causes
the most useful way I found to debug is to use torch.cuda.memory_allocated() and torch.cuda.max_memory_allocated() to print a percent of used ...
Read more >
Memory Leakage with PyTorch
Memory Leakage with PyTorch · DETACH THE LOSS and GET ONLY ITS VALUE · MOVE MODEL, INPUT and OUTPUT to CUDA · TRY...
Read more >
How to debug causes of GPU memory leaks?
I am having a memory bug here (Advice on debugging a GPU memory leak in graph?) where, even when the model is in...
Read more >
Need Help Debugging Memory Leaks in PyTorch
Reduce batch size: Try reducing the batch size to see if that resolves the memory leak. These are a few strategies to help...
Read more >
PyTorch Memory Leak on LossBackward on Both GPU and ...
The PyTorch memory leak on loss.backward() can occur due to a few reasons. One common cause is the accumulation of gradients during ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found