Question about return value of `validation_epoch_end`
See original GitHub issue❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
I’m a bit confused about what to return from methods like validation_epoch_end and what to put inside its log member.
Based on the document the log member of the return value of validation_epoch_end mainly for logging and plotting?
In the MNIST example, if I change the validation_epoch_end method to
def validation_epoch_end(self, outputs):
# OPTIONAL
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
tensorboard_logs = {'val_loss': avg_loss}
return {'avg_val_loss': avg_loss}
I will get a RuntimeWarning: Can save best model only with val_loss available, skipping.. It seems that it’s looking metrics inside the log member to determine best model.
If I change the training_stepmethod to
def training_step(self, batch, batch_nb):
# REQUIRED
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
tensorboard_logs = {'train_loss': loss}
return {'log': tensorboard_logs}
and only put train_loss inside log, I will get a RuntimeError: No loss value in the dictionary returned frommodel.training_step(). It seems that some procedure is looking for value inside the return value but not its log member.
I’m confused about what to put inside these methods’ return value and their log member.
Updated:
Now I encountered this issue, I’m getting more and more confused why the test result will be found in return of progress_bar member…
Maybe I’m missing something, but I didn’t find details of all theses in the docs.
Versions
pytorch-lightning: 0.7.1.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)

Top Related StackOverflow Question
You need to return whatever metric the checkpoint callback is using to monitor the best model. In this case, val_loss is used to monitor for the best model, and you need to return it separately from the logs.
In the same vein, the backward is performed on the “loss” key of the return dict from training_step, so you need to have defined a “loss” return.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.