Validation step metrics not logged
See original GitHub issue❓ Questions and Help
It seems like data output in the validation_step does not get logged to tensorboard, it needs to be aggregated first in the validation_epoch_end, which is not the case for training_step.
The below would only show val_loss and only aggregated but all of mae, mape etc from every iteration.
As a workaround I could explicitly log, but how do I get the current iteration in the callbacks I only see how to get the current epoch.
def step(self, y_hat, y, mode='train'):
loss = F.mse_loss(y_hat, y)
mae = F.l1_loss(y_hat, y)
mape = median_absolute_percentage_error(y_hat, y)
r2 = r2_score(y_hat, y)
out = {'loss': loss, 'mae': mae, 'mape': mape, 'R2': r2}
if mode=='train':
out['log'] = out.copy()
return out
elif mode =='val':
out = {f'{mode}_{k}': v for k,v in out.items()}
out['log'] = out.copy()
return out
else:
raise Exception('Unsupported mode')
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
return self.step(y_hat, y, 'val')
def validation_epoch_end(self, outputs):
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
tensorboard_logs = {'val_loss': avg_loss}
return {'val_loss': avg_loss, 'log': tensorboard_logs}
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Validation step metrics not logged · Issue #2102 · Lightning-AI ...
It seems like data output in the validation_step does not get logged to tensorboard, it needs to be aggregated first in the validation_epoch_end ......
Read more >How to monitor both train and validation metrics at the same ...
Hi, I am finetuning BertForSequenceClassification using run_glue.py and I would like to output every logging_steps all the performance ...
Read more >TorchMetrics in PyTorch Lightning
Logging metrics can be done in two ways: either logging the metric object directly or the computed metric values. When Metric objects, which...
Read more >Metrics & Performance - Documentation - Weights & Biases
Can I log metrics on two different time scales? (For example, I want to log training accuracy per batch and validation accuracy per...
Read more >TensorBoard Scalars: Logging training metrics in Keras
TensorBoard Scalars: Logging training metrics in Keras ... note how both training and validation loss rapidly decrease, ... Not bad!
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

you can get the current step with
self.global_stepor self.trainer.global_stepyes, it’s true you can manually log it yourself, but there is a reason why we don’t log each step of the validation, because the loggers (all as far as I know) use a global step for logging, and this means if your training epoch has n bathces, and your validation has m batches, after the first epoch you will log n + m steps for training + validation and then your training loss will contingue with step n+m+1 instead of n+1 for epoch 2. You will see a big jump in the visualization.
TensorBoard is too limited, you cannot set the abscissa to anything else than the step (as far as I know). Therefore logging validation step makes no sense.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.