Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

training_epoch needs to return a "loss" key in the dict

See original GitHub issue

📚 Documentation

Hi everyone!

In the docs detailing the usage of the logging functions training_epoch_end, if "loss": loss is not explicitly passed as a return value, then the code will fail.

The docs at https://pytorch-lightning.readthedocs.io/en/latest/experiment_reporting.html#log-metrics is not correct,

def training_epoch_end(self, outputs):
   loss = some_loss()
   ...

   logs = {'train_loss': loss}
   results = {'log': logs}
   return results

shall be changed to

def training_epoch_end(self, outputs):
   loss = some_loss()
   ...

   logs = {'train_loss': loss}
   results = {'loss': loss, 'log': logs} # <------------------------ Here is the change
   return results

Use case: I want to monitor the training metrics to check whether my network is able to overfit the data (for this functionality, refer to #1076 )

Issue Analytics

State:
Created 4 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

rpatrik96commented, Mar 21, 2020

Oh, you are right, I was using the old version; interestingly, I got no warning even on a server using the newest lightning version. Is it any way to enable explicit warnings about to be deprecated features. I like to maintain my code to be possibly up to date - although, it does not happen all the time, as in this case. Anyway, I am closing this issue.

0reactions

awaelchlicommented, Mar 21, 2020

training_epoch_end does not exist yet, right (#1076)? When you say the code fails, you probably used the old training_end which was on batches and now is renamed to training_step_end. There the loss key is required.

I don’t see why the loss key should be mandatory in training_epoch_end. As far as I know, validation_epoch_end also does not require a loss key in return dict.