ModelCheckpoint files not closed
See original GitHub issueSystem information. tf_env.txt
- Custom code (see below)
- MacOS 12.3.1
- TensorFlow installed via pip
- TensorFlow version: 2.9.1
- Python version: 3.8.11
- GPU model and memory: 8-Core Intel Core i9, 2,3 GHz, 16 GB RAM
Describe the problem.
When loading weights for a model and tf.keras.callbacks.ModelCheckpoint
to store weights after each epoch in training, the checkpoint files remains open. With many repetitions I eventually run out of resources (too many open files).
(My dataset is large and split into batches)
Describe the current behavior.
After training is completed (using model.fit
) the checkpoint files is never closed.
Describe the expected behavior.
After model.fit
has completed the checkpoint files is should be closed.
- Do you want to contribute a PR? (yes/no): no
Standalone code to reproduce the issue.
This snipped will write all checkpoint files stil open after model.fit
has completed
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model, Sequential
import psutil
print('tf version: ', tf.version.VERSION)
def train(model, X, y):
model_dir = './test'
model_path = './test/cp.ckpt'
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=model_path, save_weights_only=True, verbose=0)
try:
latest = tf.train.latest_checkpoint(model_dir)
model.load_weights(latest)
except AttributeError:
print('First run -not reading checkpoint')
history = model.fit(X, y, batch_size=32, epochs=10, verbose=0, callbacks=[checkpoint_callback])
X = pd.DataFrame([1],[2],[3])
y = pd.DataFrame([1],[2],[3])
model = Sequential(
[
Input(shape=(1,)),
Dense(1)
]
)
model.compile(loss='binary_crossentropy', metrics=['accuracy'])
for _ in range(5):
history = train(model, X, y)
print('Open files:')
proc = psutil.Process()
for file in proc.open_files():
print(file[0])
Output.
Open files:
/test/cp.ckpt.index
/test/cp.ckpt.data-00000-of-00001
/test/cp.ckpt.index
/test/cp.ckpt.data-00000-of-00001
/test/cp.ckpt.index
/test/cp.ckpt.data-00000-of-00001
/test/cp.ckpt.index
/test/cp.ckpt.data-00000-of-00001
/test/cp.ckpt.index
/test/cp.ckpt.data-00000-of-00001
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Keras ModelCheckpoint doesn't save any files on Windows
ModelCheckpoint but even though it prints out Epoch 00001: saving model to cp.ckpt etc. it doesn't create any files. Any ideas?
Read more >ModelCheckpoint - Keras
ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model or weights (in a checkpoint file) at some interval, ......
Read more >Keras Callbacks and How to Save Your Model from Overtraining
In this article, you will learn how to use the ModelCheckpoint callback in Keras to save the best version of your model during...
Read more >A quick complete tutorial to save and restore Tensorflow models
Now, instead of single .ckpt file, we have two files: ... Along with this, Tensorflow also has a file named checkpoint which simply...
Read more >How to save our model to Google Drive and reuse it - Medium
Then you have to start again from the scratch, which is not optimal. ... Now, to save our model checkpoint (or any file),...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jonasrundberg After a bit of investigation, it turns out that the open files come from
model.load_weights(latest)
and notModelCheckpoint
.I’ll keep investigating why this happens.
Thanks!