on_train_end seems to get called before logging of last epoch has finished
See original GitHub issue🐛 Bug
Maybe not a bug, but unexpected behavior. When using the on_train_end method to either upload a models latest .csv file created by TestTube to neptune or to print the last numeric channel value of a metric send to neptune, the values from the final epoch have not yet been logged. When training has finished, the last line of metrics.csv is 2020-04-02 17:23:16.029189,0.04208208369463682,30.0, but for the outputs/uploads of on_train_end see code below:
Code sample
def on_epoch_end(self):
# Logging loss per epoch
train_loss_mean = np.mean(self.training_losses)
# Saves loss of final epoch for later visualization
self.final_loss = train_loss_mean
self.logger[0].experiment.log_metric('epoch/mean_absolute_loss', y=train_loss_mean, x=self.current_epoch)
self.logger[1].experiment.log({'epoch/mean_absolute_loss': train_loss_mean, 'epoch': self.current_epoch}, global_step=self.current_epoch)
self.training_losses = [] # reset for next epoch
def on_train_end(self):
save_dir = Path(self.logger[1].experiment.get_logdir()).parent/'metrics.csv'
self.logger[0].experiment.log_artifact(save_dir)
Last line of uploaded metrics.csv: 2020-04-02 15:27:57.044250 0.04208208404108882 29.0
def on_train_end(self):
log_last = self.logger[0].experiment.get_logs()
print('Last logged values: ', log_last)
Output: Last logged values: {'epoch/mean_absolute_loss': Channel(channelType='numeric', id='b00cd0e5-a427-4a3c-a10c-5033808a930e', lastX=29.0, name='epoch/mean_absolute_loss', x=29.0, y='0.04208208404108882')}
When printing self.final_loss in on_train_end I get the correct last value though.
Expected behavior
The on_train_end method to only get called after the last values have been logged.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)

Top Related StackOverflow Question
@HenryJia Just tried it, thank you! I’ll close this then 😃
@Dunrar Had a little look at this and your code. on_train_end is not being called before the epoch has finished. It just looks that way. What’s actually happening is that the logs aren’t being finalised/saved until after on_train_end has been called so it looks that way when you look at the logs inside on_train_end.
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/training_loop.py#L693
Adding a
self.logger[1].save()to the beginning of on_train_end() (or the end of on_epoch_end()) yields the result you’d expect for me for test_tube logger. I’m not familiar with Neptune but based on the structure of pytorch-lightning the result should be the same if you addself.logger[0].save()as well