question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Logger connector callback metrics only contains the last step

See original GitHub issue

🐛 Bug

When I do

def training_step(...):
   self.log("train_los", ...)

and see that log() by default uses mean reduction, I expect the "train_loss" given back to me in the callback metrics in the logger connector to be the average training loss across all examples. But right now, this is actually the training loss of the last batch, because lightning considers this to be an on_step metric. I argue that this is very unintuitive (and also undocumented). This is especially problematic when this metric is used to, e.g., perform epoch selection, since the performance on a single batch can have large variance. Worse, each epoch can have a different batch order, and hence a different last batch, so the loss value isn’t really comparable if this is the case.

If this is a design decision, could you at least let me know how I could achieve my intended purpose?

cc @carmocca @edward-io @ananthsub @rohitgr7 @kamil-kaczmarek @Raalsky @Blaizzy

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ZhaofengWucommented, May 31, 2022

Got it, thanks a lot!

0reactions
carmoccacommented, May 31, 2022

You can skip step (3) and print both validation_metric and train_loss in on_train_epoch_end. This will also work when validation runs multiple times per epoch

Read more comments on GitHub >

github_iconTop Results From Across the Web

Logger connector callback metrics only contains the last step
This is especially problematic when this metric is used to, e.g., perform epoch selection, since the performance on a single batch can have...
Read more >
Logging — PyTorch Lightning 1.8.5.post0 documentation
The log() method has a few options: on_step : Logs the metric at the current step. on_epoch : Automatically accumulates and logs at...
Read more >
A Guide To Callbacks & Metrics in Tune — Ray 2.2.0
This simple callback just prints a metric each time a result is received: from ray import tune from ray.tune import Callback from ray.air...
Read more >
Custom callback after each epoch to log certain information
Here is the solution, by subclassing Callback : from keras.callbacks import Callback class MyLogger(Callback): def on_epoch_end(self, epoch, ...
Read more >
database-metrics-logger - npm
log database metrics. Latest version: 0.8.4, last published: 3 years ago. Start using database-metrics-logger in your project by running `npm i ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found