question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoModel fit does not save the best model for export later

See original GitHub issue

Bug Description

After finishing AutoModel fit when I try to export the model it complains about missing file. The error message is as below:

ValueError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /ak_vanilla/trial_24d08bd5d9cf85fdcf31a67e75367d72/checkpoints/epoch_20/checkpoint: Not found: /ak_vanilla/trial_24d08bd5d9cf85fdcf31a67e75367d72/checkpoints/epoch_20; No such file or directory

When looking into the directory the folder for epoch_0 is available followed by epoch_21, epoch_22, … epoch_30 (max epoch). However, epoch_20 is missing. I am not sure why this behaviour occurs.

screenshot of the directory: https://prnt.sc/t6qacf

Bug Reproduction

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.python.keras.utils.data_utils import Sequence
import autokeras as ak
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train[:100]
y_train = y_train[:100]
print(x_train.shape)  # (60000, 28, 28)
print(y_train.shape)  # (60000,)
print(y_train[:3])  # array([7, 2, 1], dtype=uint8)

# Initialize the image regressor.
reg = ak.ImageRegressor(
    overwrite=True,
    max_trials=10)
# Feed the image regressor with training data.
reg.fit(x_train, y_train, epochs=30)

mdl = reg.export_model()

Data used by the code:

Loading default mnist_data (shown in the code)

Expected Behavior

export_model() should export the Keras model. AutoModel.fit() should save all the epochs during training.

Setup Details

Ubuntu 18.04 Python 3.6.9 autokeras==1.0.3 keras-tuner==1.0.2rc0 sklearn==0.23.1 numpy==1.18.4 pandas==1.0.5 tensorflow==2.2.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
I-Kryachkocommented, Jul 7, 2020

I think that you do not need to fix this by hardcoding the metric that you need in a certan task.

The problem for me was that autokeras could not get correct model checkpoint for epoch because it was looking for a deleted one and that is why i got the error when the trials finished and the final “best model” loop started to loop the best trials.

The deleting algo was described earlier and is in save_model method. Debugging this I noticed that epoch value differs from step value in console and in trial.json step value equals the epoch_value. Then I noticed that the deleted epoch is the previous to the first that was saved in my checkpoints directory. For example if i have best step=9 for trial then my best epoch number is 10 in console log and checkpoint is saved in epoch_9 directory. And the save_model method just delets my directory epoch_9 cause it starts to delete from the wrong epoch number. That is why I fixed the line in the method above: epoch_to_delete = epoch - self._save_n_checkpoints to this: epoch_to_delete = epoch - self._save_n_checkpoints - 1 and now my best checkpoints are stored correctly. Hope this helps you too.

2reactions
haifeng-jincommented, Jul 12, 2020

I examined this bug and have the fix in the is PR #1229 .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Models - Hugging Face
PreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a...
Read more >
Saving best model in keras - Stack Overflow
I guess model_2.compile was a typo. This should help if you want to save the best model w.r.t to the val_losses -
Read more >
Export Model - AutoKeras
You can easily export your model the best model found by AutoKeras as a Keras Model. The following example uses ImageClassifier as an...
Read more >
X-13ARIMA-SEATS Reference Manual - Census.gov
5.5.3 Do not use the criteria to compare models with different ... Chapter 4 discusses the general regARIMA model fit by the X-13ARIMA-SEATS...
Read more >
AutoGluon Tasks
Fit classification models predicting the “class” column: ... DO NOT STORE FILES INSIDE OF THE MODEL DIRECTORY THAT ARE UNRELATED TO AUTOGLUON.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found