Memory leak during training
See original GitHub issueTensorFlow.js version
“@tensorflow/tfjs@^0.9.1”: version “0.9.1” dependencies: “@tensorflow/tfjs-core” “0.7.1” “@tensorflow/tfjs-layers” “0.4.1”
Browser version
68.0.3397.0 (Official Build) canary (64-bit)
Describe the problem or feature request
When training a simple model with fit()
, I’m experiencing a huge memory leak. See below image. It grows to eventually consume my GPU’s memory, and then it will consume my system’s shared memory.
I’ve looked at the examples, and fit()
is never surrounded by tidy()
. I suppose that’s because it already uses tidy()
internally?
Code to reproduce the bug / link to feature request
This code is enough to reproduce the issue:
const inputs = tf.layers.input({ shape: [256], dtype: DType.float32 });
const dense1 = tf.layers.dense({ units: 128, activation: 'relu' }).apply(inputs);
const dense2 = tf.layers.dense({ units: 64, activation: 'relu' }).apply(dense1);
const outputs = tf.layers.dense({ units: 6, activation: 'softmax' }).apply(dense2);
const model = tf.model({ inputs, outputs });
model.compile({
loss: 'categoricalCrossentropy',
optimizer: 'adam',
});
const numSamples = 25632;
const xData = new Float32Array(numSamples * 256);
const yData = new Int32Array(numSamples);
const x = tf.tensor2d(xData, [numSamples, 256], 'float32');
const y = tf.oneHot(tf.tensor1d(yData, 'float32'), 6);
for (let i = 0; i < 100; i++) {
await model.fit(x, y, {
batchSize: 64,
epochs: 5,
shuffle: true,
validationSplit: .2,
});
}
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (1 by maintainers)
Top Results From Across the Web
Dealing with memory leak issue in Keras model training
Recently, I was trying to train my keras (v2.4.3) model with tensorflow-gpu (v2.2.0) backend on NVIDIA's Tesla V100-DGXS-32GB.
Read more >Keras memory leak - The Kernel Trip
There even is another article simply titled Dealing with memory leak issue in Keras model training and is even mentioned on twitter ....
Read more >memory leak following `model.fit()` for tf 2.7, 2.8, 2.9 (nightly ...
I tried recently training yolo3 on a small dataset (which uses tf.keras. ... Memory leak after model.fit is called in tf 2.7 and...
Read more >Keras model training memory leak - Stack Overflow
I faced a similar issue while training different models in a same script. I collected some possible fixes and workarounds here : memory...
Read more >Memory Leak when training PPO on a single agent environment
I configured my PPO algorithm to use the MemoryTrackingCallback. I started off with testing only tens of iterations with a gpu and 6...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@bileschi The memory optimization that went into the latest release is orthogonal to this issue. I have it on my TODO list to look at this issue soon.
tfjs 0.11.1 uses tfjs-layers 0.6.1 I figured this out by looking at the package.json file https://github.com/caisq/tfjs-1/blob/dc599fdf73f9eb9e4940590d137546570c9012b4/package.json
Sounds like we need to dig deeper to find the root cause. In the mean time, if this is blocking for you can you wrap your call to
model.fit()
in atidy()
block?