question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about return value of `validation_epoch_end`

See original GitHub issue

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

I’m a bit confused about what to return from methods like validation_epoch_end and what to put inside its log member.

Based on the document the log member of the return value of validation_epoch_end mainly for logging and plotting?

In the MNIST example, if I change the validation_epoch_end method to


def validation_epoch_end(self, outputs):
   # OPTIONAL
   avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
   tensorboard_logs = {'val_loss': avg_loss}
   return {'avg_val_loss': avg_loss}

I will get a RuntimeWarning: Can save best model only with val_loss available, skipping.. It seems that it’s looking metrics inside the log member to determine best model.

If I change the training_stepmethod to


def training_step(self, batch, batch_nb):
   # REQUIRED
   x, y = batch
   y_hat = self.forward(x)
   loss = F.cross_entropy(y_hat, y)
   tensorboard_logs = {'train_loss': loss}
   return {'log': tensorboard_logs}

and only put train_loss inside log, I will get a RuntimeError: No loss value in the dictionary returned frommodel.training_step(). It seems that some procedure is looking for value inside the return value but not its log member.

I’m confused about what to put inside these methods’ return value and their log member.


Updated:

Now I encountered this issue, I’m getting more and more confused why the test result will be found in return of progress_bar member…

Maybe I’m missing something, but I didn’t find details of all theses in the docs.

Versions

pytorch-lightning: 0.7.1.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dscarmocommented, Mar 15, 2020

You need to return whatever metric the checkpoint callback is using to monitor the best model. In this case, val_loss is used to monitor for the best model, and you need to return it separately from the logs.

In the same vein, the backward is performed on the “loss” key of the return dict from training_step, so you need to have defined a “loss” return.

0reactions
stale[bot]commented, May 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to log by epoch for both training and validation on 1.0 ...
Based on this I have some questions. Is this an intended way of logging per epoch? If yes, is the idea that the...
Read more >
Understanding logging and validation_step ...
I have hard to understand how to use return in validation_step, validation_epoch_end (well this also goes for train and test).
Read more >
pytorch lightning epoch_end/validation_epoch_end
Based on the structure, I assume you are using pytorch_lightning . validation_epoch_end() will collect outputs from validation_step() ...
Read more >
Difference Between a Batch and an Epoch in a Neural Network
Two hyperparameters that often confuse beginners are the batch size and number of epochs. They are both integer values and seem to do...
Read more >
Validation loss is not decreasing - Data Science Stack Exchange
The model is overfitting right from epoch 10, the validation loss is increasing while the training loss is decreasing. Dealing with such a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found