question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Operations with variable tensor sizes cause GPU Memory leaks

See original GitHub issue

TensorFlow.js version

  • tfjs-core 0.11.9
  • tfjs-core 0.12.10

Browser version

  • chrome 67.0.3396.99 (64-Bit)
  • firefox 61.0.1 (64-Bit)

Describe the problem or feature request

Running operations with variable input tensor sizes causes GPU memory leaks (not tracked by tf.memory stats, but can be tracked using chrome task manager for example):

for (let i = 0; i < iterations; i++) {
  const height = Math.floor(Math.random() * maxTensorSize)
  const width = Math.floor(Math.random() * maxTensorSize)

  console.log(height, width)

  const t1 = tf.ones([height, width])
  const t2 = tf.ones([height, width])

  // do something
  const sum = t1.add(t2)

  t1.dispose()
  t2.dispose()
  sum.dispose()

  await tf.nextFrame()

  console.log(tf.memory())
}

Code to reproduce the bug / link to feature request

https://github.com/justadudewhohacks/tfjs-tensor-size-memoryleak-issue

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:15 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
justadudewhohackscommented, Aug 27, 2018

In case someone is facing the same issue, when training an image classifier or an object detector, you can mitigate that issue by resizing your images to a fixed input size, before calling tf.fromPixels and instead of doing tensor operations for padding and resizing:

export function imageToSquare(img: HTMLImageElement | HTMLCanvasElement, inputSize: number): HTMLCanvasElement {

  const dims = img instanceof HTMLImageElement 
    ? { width: img.naturalWidth, height: img.naturalHeight }
    : img 
  const scale = inputSize / Math.max(dims.height, dims.width)
  const width = scale * dims.width
  const height = scale * dims.height

  const targetCanvas = document.createElement('canvas')
  targetCanvas .width = inputSize
  targetCanvas .height = inputSize
  targetCanvas.getContext('2d').drawImage(img, 0, 0, width, height)

  return targetCanvas
}
0reactions
dhasegancommented, May 13, 2020

tf.memory() is not increasing for me. my input has varying sizes as well and each new size there is a new shader that is created and cached in the TFJS library: https://github.com/tensorflow/tfjs/issues/3061

There is no cache purge so it slowly accumulates GPU memory (as seen in the Chrome Task Manager). You might reach a similar issue if you have different sizes for webcamElement

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to debug causes of GPU memory leaks? - PyTorch Forums
I understand that probably there is some variable(s) that is not freed because I keep it in the graph. The question is how...
Read more >
PyTorch 101, Part 4: Memory Management and Using Multiple ...
This article covers PyTorch's advanced GPU management features, how to optimise memory usage and best practises for debugging memory errors.
Read more >
GPU memory increasing at each batch (PyTorch)
A few quick notes about training code: torch.Variable is deprecated since at least 8 minor versions (see here), don't use it; gc.collect() ...
Read more >
Memory Leaks in Intel® oneAPI Math Kernel Library
Memory leaks can occur if the Intel® oneAPI Math Kernel Library is ... impact the performance of some oneMKL functions, especially for small...
Read more >
Running out of GPU memory with just 3 samples of ...
Before the first onBatchEnd is called, I'm getting a High memory usage in GPU, most likely due to a memory leak warning, but...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found