question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Controlling the global step in TrainResult.log and EvalResult.log

See original GitHub issue

I can find a way to control the global step of logging metrics to TrainResult using the log (or log_dict) functions? What is the proper way to use these functions so the logged metrics will show in tensorboard once per epoch? Currently, the steps showing on my tensorboard are (63*i - 1) for i=1 and so on. This is my training_step function (validation_step is similar using pl.EvalResult)

def training_step(self, batch, batch_idx) -> pl.TrainResult:
    x, mask = batch
    pred = self(x)
    loss = self.loss_function(pred, mask)
    result = pl.TrainResult(loss)
    result.log("Trainer/cross_entropy_loss", self.cross_entropy_loss(pred, mask))
    return result

I tried to set most of TrainResult.log parameters manually (like on_epoch, logger, sync_dist, reduce_fx and so on)

  • OS: Ubuntu 18.04.4 Nvidia driver version: 440.33.01 CUDA versions available: cuda-10.0 cuda-10.1 cuda-10.2 cuda-9.0 cuda-9.2 Default CUDA version is 10.0
  • torch==1.5.1
  • pytorch-lightning==0.9.0

Thank you a lot in advance

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ydcjeffcommented, Sep 3, 2020

Hi @shalgi-beyond setting trainer = pl.Trainer(gpus=1, log_save_interval=1, row_log_interval=1) would do the trick. Since I am not quite familiar with tensorboard, I have created the question on the forum for you https://forums.pytorchlightning.ai/t/log-save-interval-and-row-log-interval/135

0reactions
stale[bot]commented, Oct 21, 2020

This issue has been automatically marked as stale because it hasn’t had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Controlling the global step in TrainResult.log and EvalResult.log
I can find a way to control the global step of logging metrics to TrainResult using the log (or log_dict) functions?
Read more >
Log_save_interval and row_log_interval - Trainer - Lightning AI
I noticed logging in tensorboard is done at row_log_interval ... GH issue: Controlling the global step in TrainResult.log and EvalResult.log ...
Read more >
[PyTorch Lightning] Log Training Losses when Accumulating ...
[PyTorch Lightning] Log Training Losses when Accumulating Gradients. The global step is not what you think it is.
Read more >
synced BatchNorm, DataModules and final API! | by PyTorch ...
They are meant to control where and when to log and how synchronization is done ... TrainResult default is to log every step...
Read more >
PyTorch-Lightning Documentation
The return object TrainResult controls where to log, when to log ... Note: Lightning saves all aspects of training (epoch, global step, etc....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found