Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tensorboard hyperparameters don't update

See original GitHub issue

MWE:

Run code
start tensorboard --logdir=lightning_logs in same directory
Go to HPARAMS in website
See only layer_1_dim

Expected behavior:

Run code
start tensorboard --logdir=lightning_logs in same directory
Go to HPARAMS in website
See layer_1_dim and another_hyperparameter
- but another_hyperparameter empty in version0

Solved:

Run second code. The trick is to call the net with all hyperparameters first, then tensorboard gets another_hyperparameter

Sadly I have only little knowledge of tensorboard and do not know what to search for - maybe there is an option to set this, then I am very sorry but would appreciate a hint. Maybe this is also more an issue for pytorch-lightning but I just do not know. Best wishes

import pytorch_lightning as pl
from argparse import ArgumentParser
import torch

class LitMNIST(pl.LightningModule):
    def __init__(self, hparams):
        super(LitMNIST, self).__init__()
        self.hparams = hparams
        self.layer_1 = torch.nn.Linear(28 * 28, self.hparams.layer_1_dim)

    def forward(self, *args, **kwargs):
        pass



if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    parser.add_argument('--another_hyperparameter', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(another_hyperparameter=10, layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

Changed: First call net with both parameters

import pytorch_lightning as pl
from argparse import ArgumentParser
import torch

class LitMNIST(pl.LightningModule):
    def __init__(self, hparams):
        super(LitMNIST, self).__init__()
        self.hparams = hparams
        self.layer_1 = torch.nn.Linear(28 * 28, self.hparams.layer_1_dim)

    def forward(self, *args, **kwargs):
        pass



if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    parser.add_argument('--another_hyperparameter', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(another_hyperparameter=10, layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass


    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

Issue Analytics

State:
Created 4 years ago
Comments:8

Top GitHub Comments

1reaction

LarsHillcommented, Apr 9, 2022

I would like to catch up on this issue. I am not fully convinced that this is an issue on the torch.utils.tensorboard side. If I have 2 independent runs with different summary writer instances and each run logs to a different directory (.e.g ~/001 and ~/002, then I can point tensorboard to each of the logdirs and see the full set of hyperparameters, respectively. Now I want to compare both runs in a single view, so I point tensorboard to the parent dir, namely ~/. If I check the hparams view again I am left with only the union of both hyperparameter sets. All hyperparameters that are unique to one of the runs are not shown anymore.

To me that sounds like, individually everything is logged properly via torch.utils.tensorboard. But when starting tensorboard, it can not properly iterate over all event files and build the complete hyperparameter table.

Any thoughts on that? Or am I missing something? If this is a different issue I am also happy to oben a new issue for it.

*edit: After searching a bit more, it seems my problem is related to: https://github.com/tensorflow/tensorboard/issues/2942

0reactions

nfeltcommented, Jul 1, 2020

This looks like it’s an issue in the PyTorch TensorBoard SummaryWriter implementation which is maintained by PyTorch, not us, where their API only supports writing all hparams in a single shot. I’d recommend following up at https://github.com/pytorch/pytorch/issues/39250.

Top Results From Across the Web

Why does tensorboard not show all metrics? - Stack Overflow

Try to restart tensorboard. Tensorboard seems to have an issue reliably detecting new scalar values for the 'HPARAMS' section.

Deep Dive Into TensorBoard: Tutorial With Examples

Hyperparameter tuning with TensorBoard The dashboard is available under the HPARAMS tab. To achieve this you have to clear the previous logs and...

Tune hyperparameters in your custom training loop - Keras

First, we import the libraries we need, and we create datasets for training and validation. Here, we just use some random data for...

Easy Hyperparameter Tuning with Keras Tuner and TensorFlow

To learn how to tune hyperparameters with Keras Tuner, just keep reading. ... stop, and resume hyperparameter tuning experiments.

Stop Training Jobs Early - Amazon SageMaker

Stop the training jobs that a hyperparameter tuning job launches early when they are not improving significantly as measured by the objective metric....