Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tensorboard hparams plugin API reports metrics from final epoch, not epoch with best performance on metric tracked via EarlyStopping

See original GitHub issue

System information

TensorFlow version: 2.3.0
Are you willing to contribute it: Yes, with guidance/help to know where to look

Describe the feature and the current behavior/state. Feature would change current behavior of Tensorboard. Current behavior is that Tensorboard displays the validation loss from the final epoch of training, which is not useful when comparing different models to each other.

Will this change the current API? How? This will change tensorboard.plugins.hparams.api to report the performance of the best training epoch, not just the final epoch. Since the purpose of validation is to detect overfitting, this is in line with the reasons for performing it.

Who will benefit with this feature? Everyone who uses Tensorboard to compare models to each other based on the performance of the best training epoch.

Any Other info. I am currently using a standard tf.keras.callbacks.Tensorboard instance, sub-classed with the following method as the only modification:

    def on_train_end(self, logs=None):
        if os.path.exists(os.path.join(self.log_dir, "train", "plugins")):
            shutil.rmtree(os.path.join(self.log_dir, "train", "plugins"))

I am also creating a callback with the following code:

from tensorboard.plugins.hparams import api as hp
hp.hparams_config(
    hparams=hparams_list,
    metrics=[hp.Metric(CategoricalAccuracy().name, display_name=CategoricalAccuracy().name)],
)
hp_callback = hp.KerasCallback(writer=output_dir, hparams=session_hparams)

Issue Analytics

State:
Created 3 years ago
Comments:8 (2 by maintainers)

Top GitHub Comments

2reactions

nfeltcommented, Feb 5, 2021

Thanks for the details @brethvoice. I think it might make the most sense as a UI option that lets the user dynamically select whether to show the latest, max, or min value of the metric as the representative value. Here’s a mockup:

Does that understanding match your use case?

If you’re interesting in contributing this feature I can provide some pointers on the relevant code that would need to be changed.

1reaction

psybuzzcommented, Mar 16, 2021

does that mean I need to close the issue?

Not at all! To chime in on the thread, having a way to view the “best” value of a metric in the Hparams dashboard sounds like a very reasonable request, and keeping this issue open is helpful for the TensorBoard team to keep track of what issues are most important to users.

In general, feature requests are typically only closed when it is irrelevant to TB, a duplicate of another issue, obsolete, already fixed in a newer version, or not a product priority for the TensorBoard team etc. On the other hand, an issue remaining ‘open’ does not necessarily mean that it will be fixed soon, given that the TensorBoard team has to balance priorities and work with limited resources.

Contributions are welcome, but are certainly not expected as an obligation to users.

Top Results From Across the Web

Tensorboard hparams plugin API reports metrics from final ...

Tensorboard hparams plugin API reports metrics from final epoch, *not* epoch with best performance on metric tracked via EarlyStopping #4630.

Hyperparameter Tuning with the HParams Dashboard

Adapt TensorFlow runs to log hyperparameters and metrics; Start runs and log them all under one parent directory; Visualize the results in ...

Tensorboard Only Producing Epoch Logs, Not Train/Val

Ask questionsTensorboard hparams plugin API reports metrics from final epoch not epoch with best performance on metric tracked via EarlyStopping.

Changelog — PyTorch Lightning 1.8.6 documentation

Fixed epoch-end logging results not being reset after the end of the epoch (#14061) ... Added CPU metric tracking to DeviceStatsMonitor (#11795).

Introduction to CallBacks in Tensorflow 2 - ML Hive

Tensorflow callbacks are very important to customize behaviour of ... Here is a basic example of callback using epoch end and training end....

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Tensorboard hparams plugin API reports metrics from final epoch, not epoch with best performance on metric tracked via EarlyStopping

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Callback failing: AttributeError: 'TrackableWeightHandler' object has no attribute 'name'

checksum calculation should be written C for massive speedup in EventAccumulator.Reload()