question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak during training

See original GitHub issue

TensorFlow.js version

@tensorflow/tfjs@^0.9.1”: version “0.9.1” dependencies: “@tensorflow/tfjs-core” “0.7.1” “@tensorflow/tfjs-layers” “0.4.1”

Browser version

68.0.3397.0 (Official Build) canary (64-bit)

Describe the problem or feature request

When training a simple model with fit(), I’m experiencing a huge memory leak. See below image. It grows to eventually consume my GPU’s memory, and then it will consume my system’s shared memory.

I’ve looked at the examples, and fit() is never surrounded by tidy(). I suppose that’s because it already uses tidy() internally?

Code to reproduce the bug / link to feature request

This code is enough to reproduce the issue:

const inputs = tf.layers.input({ shape: [256], dtype: DType.float32 });
const dense1 = tf.layers.dense({ units: 128, activation: 'relu' }).apply(inputs);
const dense2 = tf.layers.dense({ units: 64, activation: 'relu' }).apply(dense1);
const outputs = tf.layers.dense({ units: 6, activation: 'softmax' }).apply(dense2);
const model = tf.model({ inputs, outputs });

model.compile({
    loss: 'categoricalCrossentropy',
    optimizer: 'adam',
});

const numSamples = 25632;
const xData = new Float32Array(numSamples * 256);
const yData = new Int32Array(numSamples);

const x = tf.tensor2d(xData, [numSamples, 256], 'float32');
const y = tf.oneHot(tf.tensor1d(yData, 'float32'), 6);

for (let i = 0; i < 100; i++) {
    await model.fit(x, y, {
        batchSize: 64,
        epochs: 5,
        shuffle: true,
        validationSplit: .2,
    });
}

tensorflowjs

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
caisqcommented, May 22, 2018

@bileschi The memory optimization that went into the latest release is orthogonal to this issue. I have it on my TODO list to look at this issue soon.

1reaction
bileschicommented, May 22, 2018

tfjs 0.11.1 uses tfjs-layers 0.6.1 I figured this out by looking at the package.json file https://github.com/caisq/tfjs-1/blob/dc599fdf73f9eb9e4940590d137546570c9012b4/package.json

Sounds like we need to dig deeper to find the root cause. In the mean time, if this is blocking for you can you wrap your call to model.fit() in a tidy() block?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dealing with memory leak issue in Keras model training
Recently, I was trying to train my keras (v2.4.3) model with tensorflow-gpu (v2.2.0) backend on NVIDIA's Tesla V100-DGXS-32GB.
Read more >
Keras memory leak - The Kernel Trip
There even is another article simply titled Dealing with memory leak issue in Keras model training and is even mentioned on twitter ....
Read more >
memory leak following `model.fit()` for tf 2.7, 2.8, 2.9 (nightly ...
I tried recently training yolo3 on a small dataset (which uses tf.keras. ... Memory leak after model.fit is called in tf 2.7 and...
Read more >
Keras model training memory leak - Stack Overflow
I faced a similar issue while training different models in a same script. I collected some possible fixes and workarounds here : memory...
Read more >
Memory Leak when training PPO on a single agent environment
I configured my PPO algorithm to use the MemoryTrackingCallback. I started off with testing only tens of iterations with a gpu and 6...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found