Question about return value of `validation_epoch_end`
See original GitHub issue❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
I’m a bit confused about what to return from methods like validation_epoch_end
and what to put inside its log
member.
Based on the document the log
member of the return value of validation_epoch_end
mainly for logging and plotting?
In the MNIST example, if I change the validation_epoch_end
method to
def validation_epoch_end(self, outputs):
# OPTIONAL
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
tensorboard_logs = {'val_loss': avg_loss}
return {'avg_val_loss': avg_loss}
I will get a RuntimeWarning: Can save best model only with val_loss available, skipping.
. It seems that it’s looking metrics inside the log
member to determine best model.
If I change the training_step
method to
def training_step(self, batch, batch_nb):
# REQUIRED
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
tensorboard_logs = {'train_loss': loss}
return {'log': tensorboard_logs}
and only put train_loss
inside log
, I will get a RuntimeError: No
loss value in the dictionary returned frommodel.training_step().
It seems that some procedure is looking for value inside the return value but not its log
member.
I’m confused about what to put inside these methods’ return value and their log
member.
Updated:
Now I encountered this issue, I’m getting more and more confused why the test result will be found in return of progress_bar
member…
Maybe I’m missing something, but I didn’t find details of all theses in the docs.
Versions
pytorch-lightning
: 0.7.1.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
You need to return whatever metric the checkpoint callback is using to monitor the best model. In this case, val_loss is used to monitor for the best model, and you need to return it separately from the logs.
In the same vein, the backward is performed on the “loss” key of the return dict from training_step, so you need to have defined a “loss” return.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.