question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tensorboard hyperparameters don't update

See original GitHub issue

MWE:

  • Run code
  • start tensorboard --logdir=lightning_logs in same directory
  • Go to HPARAMS in website
  • See only layer_1_dim

Expected behavior:

  • Run code
  • start tensorboard --logdir=lightning_logs in same directory
  • Go to HPARAMS in website
  • See layer_1_dim and another_hyperparameter
    • but another_hyperparameter empty in version0

Solved:

  • Run second code. The trick is to call the net with all hyperparameters first, then tensorboard gets another_hyperparameter

Sadly I have only little knowledge of tensorboard and do not know what to search for - maybe there is an option to set this, then I am very sorry but would appreciate a hint. Maybe this is also more an issue for pytorch-lightning but I just do not know. Best wishes

import pytorch_lightning as pl
from argparse import ArgumentParser
import torch

class LitMNIST(pl.LightningModule):
    def __init__(self, hparams):
        super(LitMNIST, self).__init__()
        self.hparams = hparams
        self.layer_1 = torch.nn.Linear(28 * 28, self.hparams.layer_1_dim)

    def forward(self, *args, **kwargs):
        pass



if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    parser.add_argument('--another_hyperparameter', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(another_hyperparameter=10, layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

Changed: First call net with both parameters

import pytorch_lightning as pl
from argparse import ArgumentParser
import torch

class LitMNIST(pl.LightningModule):
    def __init__(self, hparams):
        super(LitMNIST, self).__init__()
        self.hparams = hparams
        self.layer_1 = torch.nn.Linear(28 * 28, self.hparams.layer_1_dim)

    def forward(self, *args, **kwargs):
        pass



if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    parser.add_argument('--another_hyperparameter', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(another_hyperparameter=10, layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass


    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
LarsHillcommented, Apr 9, 2022

I would like to catch up on this issue. I am not fully convinced that this is an issue on the torch.utils.tensorboard side. If I have 2 independent runs with different summary writer instances and each run logs to a different directory (.e.g ~/001 and ~/002, then I can point tensorboard to each of the logdirs and see the full set of hyperparameters, respectively. Now I want to compare both runs in a single view, so I point tensorboard to the parent dir, namely ~/. If I check the hparams view again I am left with only the union of both hyperparameter sets. All hyperparameters that are unique to one of the runs are not shown anymore.

To me that sounds like, individually everything is logged properly via torch.utils.tensorboard. But when starting tensorboard, it can not properly iterate over all event files and build the complete hyperparameter table.

Any thoughts on that? Or am I missing something? If this is a different issue I am also happy to oben a new issue for it.

*edit: After searching a bit more, it seems my problem is related to: https://github.com/tensorflow/tensorboard/issues/2942

0reactions
nfeltcommented, Jul 1, 2020

This looks like it’s an issue in the PyTorch TensorBoard SummaryWriter implementation which is maintained by PyTorch, not us, where their API only supports writing all hparams in a single shot. I’d recommend following up at https://github.com/pytorch/pytorch/issues/39250.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does tensorboard not show all metrics? - Stack Overflow
Try to restart tensorboard. Tensorboard seems to have an issue reliably detecting new scalar values for the 'HPARAMS' section.
Read more >
Deep Dive Into TensorBoard: Tutorial With Examples
Hyperparameter tuning with TensorBoard​​ The dashboard is available under the HPARAMS tab. To achieve this you have to clear the previous logs and...
Read more >
Tune hyperparameters in your custom training loop - Keras
First, we import the libraries we need, and we create datasets for training and validation. Here, we just use some random data for...
Read more >
Easy Hyperparameter Tuning with Keras Tuner and TensorFlow
To learn how to tune hyperparameters with Keras Tuner, just keep reading. ... stop, and resume hyperparameter tuning experiments.
Read more >
Stop Training Jobs Early - Amazon SageMaker
Stop the training jobs that a hyperparameter tuning job launches early when they are not improving significantly as measured by the objective metric....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found