Operations with variable tensor sizes cause GPU Memory leaks
See original GitHub issueTensorFlow.js version
- tfjs-core 0.11.9
- tfjs-core 0.12.10
Browser version
- chrome 67.0.3396.99 (64-Bit)
- firefox 61.0.1 (64-Bit)
Describe the problem or feature request
Running operations with variable input tensor sizes causes GPU memory leaks (not tracked by tf.memory stats, but can be tracked using chrome task manager for example):
for (let i = 0; i < iterations; i++) {
const height = Math.floor(Math.random() * maxTensorSize)
const width = Math.floor(Math.random() * maxTensorSize)
console.log(height, width)
const t1 = tf.ones([height, width])
const t2 = tf.ones([height, width])
// do something
const sum = t1.add(t2)
t1.dispose()
t2.dispose()
sum.dispose()
await tf.nextFrame()
console.log(tf.memory())
}
Code to reproduce the bug / link to feature request
https://github.com/justadudewhohacks/tfjs-tensor-size-memoryleak-issue
Issue Analytics
- State:
- Created 5 years ago
- Comments:15 (4 by maintainers)
Top Results From Across the Web
How to debug causes of GPU memory leaks? - PyTorch Forums
I understand that probably there is some variable(s) that is not freed because I keep it in the graph. The question is how...
Read more >PyTorch 101, Part 4: Memory Management and Using Multiple ...
This article covers PyTorch's advanced GPU management features, how to optimise memory usage and best practises for debugging memory errors.
Read more >GPU memory increasing at each batch (PyTorch)
A few quick notes about training code: torch.Variable is deprecated since at least 8 minor versions (see here), don't use it; gc.collect() ...
Read more >Memory Leaks in Intel® oneAPI Math Kernel Library
Memory leaks can occur if the Intel® oneAPI Math Kernel Library is ... impact the performance of some oneMKL functions, especially for small...
Read more >Running out of GPU memory with just 3 samples of ...
Before the first onBatchEnd is called, I'm getting a High memory usage in GPU, most likely due to a memory leak warning, but...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
In case someone is facing the same issue, when training an image classifier or an object detector, you can mitigate that issue by resizing your images to a fixed input size, before calling
tf.fromPixels
and instead of doing tensor operations for padding and resizing:tf.memory()
is not increasing for me. my input has varying sizes as well and each new size there is a new shader that is created and cached in the TFJS library: https://github.com/tensorflow/tfjs/issues/3061There is no cache purge so it slowly accumulates GPU memory (as seen in the Chrome Task Manager). You might reach a similar issue if you have different sizes for
webcamElement