question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Validation step metrics not logged

See original GitHub issue

❓ Questions and Help

It seems like data output in the validation_step does not get logged to tensorboard, it needs to be aggregated first in the validation_epoch_end, which is not the case for training_step.

The below would only show val_loss and only aggregated but all of mae, mape etc from every iteration.

As a workaround I could explicitly log, but how do I get the current iteration in the callbacks I only see how to get the current epoch.

    def step(self, y_hat, y, mode='train'):
        loss = F.mse_loss(y_hat, y)
        mae = F.l1_loss(y_hat, y)
        mape = median_absolute_percentage_error(y_hat, y)
        r2 = r2_score(y_hat, y)
        out = {'loss': loss, 'mae': mae, 'mape': mape, 'R2': r2}
        if mode=='train':
            out['log'] = out.copy()
            return out
        elif mode =='val':
            out = {f'{mode}_{k}': v for k,v in out.items()}
            out['log'] = out.copy()
            return out
        else:
            raise Exception('Unsupported mode')
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        return self.step(y_hat, y, 'val')

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        tensorboard_logs = {'val_loss': avg_loss}
        return {'val_loss': avg_loss, 'log': tensorboard_logs}

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
awaelchlicommented, Jun 7, 2020

you can get the current step with self.global_step or self.trainer.global_step

yes, it’s true you can manually log it yourself, but there is a reason why we don’t log each step of the validation, because the loggers (all as far as I know) use a global step for logging, and this means if your training epoch has n bathces, and your validation has m batches, after the first epoch you will log n + m steps for training + validation and then your training loss will contingue with step n+m+1 instead of n+1 for epoch 2. You will see a big jump in the visualization.

TensorBoard is too limited, you cannot set the abscissa to anything else than the step (as far as I know). Therefore logging validation step makes no sense.

0reactions
stale[bot]commented, Aug 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Validation step metrics not logged · Issue #2102 · Lightning-AI ...
It seems like data output in the validation_step does not get logged to tensorboard, it needs to be aggregated first in the validation_epoch_end ......
Read more >
How to monitor both train and validation metrics at the same ...
Hi, I am finetuning BertForSequenceClassification using run_glue.py and I would like to output every logging_steps all the performance ...
Read more >
TorchMetrics in PyTorch Lightning
Logging metrics can be done in two ways: either logging the metric object directly or the computed metric values. When Metric objects, which...
Read more >
Metrics & Performance - Documentation - Weights & Biases
Can I log metrics on two different time scales? (For example, I want to log training accuracy per batch and validation accuracy per...
Read more >
TensorBoard Scalars: Logging training metrics in Keras
TensorBoard Scalars: Logging training metrics in Keras ... note how both training and validation loss rapidly decrease, ... Not bad!
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found