Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak during training

See original GitHub issue

TensorFlow.js version

“@tensorflow/tfjs@^0.9.1”: version “0.9.1” dependencies: “@tensorflow/tfjs-core” “0.7.1” “@tensorflow/tfjs-layers” “0.4.1”

Browser version

68.0.3397.0 (Official Build) canary (64-bit)

Describe the problem or feature request

When training a simple model with fit(), I’m experiencing a huge memory leak. See below image. It grows to eventually consume my GPU’s memory, and then it will consume my system’s shared memory.

I’ve looked at the examples, and fit() is never surrounded by tidy(). I suppose that’s because it already uses tidy() internally?

Code to reproduce the bug / link to feature request

This code is enough to reproduce the issue:

const inputs = tf.layers.input({ shape: [256], dtype: DType.float32 });
const dense1 = tf.layers.dense({ units: 128, activation: 'relu' }).apply(inputs);
const dense2 = tf.layers.dense({ units: 64, activation: 'relu' }).apply(dense1);
const outputs = tf.layers.dense({ units: 6, activation: 'softmax' }).apply(dense2);
const model = tf.model({ inputs, outputs });

model.compile({
    loss: 'categoricalCrossentropy',
    optimizer: 'adam',
});

const numSamples = 25632;
const xData = new Float32Array(numSamples * 256);
const yData = new Int32Array(numSamples);

const x = tf.tensor2d(xData, [numSamples, 256], 'float32');
const y = tf.oneHot(tf.tensor1d(yData, 'float32'), 6);

for (let i = 0; i < 100; i++) {
    await model.fit(x, y, {
        batchSize: 64,
        epochs: 5,
        shuffle: true,
        validationSplit: .2,
    });
}

tensorflowjs

Issue Analytics

State:
Created 5 years ago
Comments:8 (1 by maintainers)

Top GitHub Comments

1reaction

caisqcommented, May 22, 2018

@bileschi The memory optimization that went into the latest release is orthogonal to this issue. I have it on my TODO list to look at this issue soon.

1reaction

bileschicommented, May 22, 2018

tfjs 0.11.1 uses tfjs-layers 0.6.1 I figured this out by looking at the package.json file https://github.com/caisq/tfjs-1/blob/dc599fdf73f9eb9e4940590d137546570c9012b4/package.json

Sounds like we need to dig deeper to find the root cause. In the mean time, if this is blocking for you can you wrap your call to model.fit() in a tidy() block?