question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Keras callback creating .profile-empty file blocks loading data

See original GitHub issue

Repro steps:

  1. Create a virtualenv with tf-nightly-2.0-preview==2.0.0.dev20190402 and open two terminals in this environment.

  2. In one terminal, run the following simple Python script (but continue to the next step while this script is still running):

    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function
    
    import tensorflow as tf
    
    
    DATASET = tf.keras.datasets.mnist
    INPUT_SHAPE = (28, 28)
    OUTPUT_CLASSES = 10
    
    
    def model_fn():
      model = tf.keras.models.Sequential([
          tf.keras.layers.Input(INPUT_SHAPE),
          tf.keras.layers.Flatten(),
          tf.keras.layers.Dense(128, activation="relu"),
          tf.keras.layers.BatchNormalization(),
          tf.keras.layers.Dense(256, activation="relu"),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Dense(OUTPUT_CLASSES, activation="softmax"),
      ])
      model.compile(
          loss="sparse_categorical_crossentropy",
          optimizer="adagrad",
          metrics=["accuracy"],
      )
      return model
    
    
    def main():
      model = model_fn()
      ((x_train, y_train), (x_test, y_test)) = DATASET.load_data()
      model.fit(
          x=x_train,
          y=y_train,
          validation_data=(x_test, y_test),
          callbacks=[tf.keras.callbacks.TensorBoard()],
          epochs=5,
      )
    
    
    if __name__ == "__main__":
      main()
    
  3. Wait for (say) epoch 2/5 to finish training. Then, in the other terminal, launch tensorboard --logdir ./logs.

  4. Open TensorBoard and observe that both training and validation runs appear with two epochs’ worth of data:

    Screenshot just after launching TensorBoard

  5. As training continues, refresh TensorBoard and/or reload the page. Observe that validation data continues to appear, but training data has stalled—even after well after the training has completed, the plot is incomplete:

    Screenshot of bad state

  6. Kill the TensorBoard process and restart it. Note that the data appears as desired:

    Screenshot of good state after TensorBoard relaunch

The same problem occurs in tf-nightly (non-2.0-preview), but manifests differently: because there is only one run (named .) instead of separate train/validation, all data stops being displayed after the epoch in which TensorBoard is opened.

Note as a special case of this that if TensorBoard is running before training starts, then train data may not appear at all:

Screenshot of validation-only data

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:8
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

21reactions
wchargincommented, Apr 15, 2019

Just to note explicitly: setting profile_batch=0 in the Keras callback options is a workaround that disables profiling entirely.

4reactions
grwlfcommented, Nov 12, 2019

FYI: The problem is still here, in 1cf0898dd of TF v2.0.0. Workaround above works.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Writing your own callbacks | TensorFlow Core
In this guide, you will learn what a Keras callback is, what it can do, and how you can build your own. We...
Read more >
Writing your own callbacks - Keras
In this guide, you will learn what a Keras callback is, what it can do, and how you can build your own. We...
Read more >
TensorFlow Tutorial 14 - Callbacks with Keras and ... - YouTube
In this video we look at ways to customize model behavior during training and testing using Keras Callbacks. Specifically we look at ways ......
Read more >
How to use the ModelCheckpoint callback with Keras and ...
Learn how to monitor a given metric such as validation loss during training and then save high-performing networks to disk.
Read more >
Quick Start with Tensorflow Callbacks - Analytics Vidhya
Tensorflow callbacks are functions or blocks of code which are executed during a specific instant while training a Deep Learning Model.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found