question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tensorboard logging by epoch instead of by step

See original GitHub issue

Short question concerning the tensorboard logging:

I am using it like this:

    def training_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        tensorboard_logs = {'train/loss': avg_loss}
        for name in self.metrics:
            tensorboard_logs['train/{}'.format(name)] = torch.stack([x['metr'][name] for x in outputs]).mean()

        return {'loss': avg_loss, 'log': tensorboard_logs}

It works very well, but in the plots (the x-axis) is the step, so each batch is a step. Is it possible to have the x-axis be the epoch as I want to plot the metrics only per epoch?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:17 (7 by maintainers)

github_iconTop GitHub Comments

20reactions
adeboissierecommented, Jun 11, 2020

Hi,

It is possible to track both the steps and epochs using tensorboard. Here is an example. It is quite straightforwd.

    def training_step(self, batch, batch_idx):
        batch, y = batch
        y_hat = self(batch)

        labels_hat = torch.argmax(y_hat, dim=1)
        n_correct_pred = torch.sum(y == labels_hat).item()

        loss = F.cross_entropy(y_hat, y.long())
        tensorboard_logs = {'train_acc_step': n_correct_pred, 'train_loss_step': loss}

        return {'loss': loss, "n_correct_pred": n_correct_pred, "n_pred": len(y), 'log': tensorboard_logs}

    def training_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()

        train_acc = sum([x['n_correct_pred'] for x in outputs]) / sum(x['n_pred'] for x in outputs)
        tensorboard_logs = {'train_acc': train_acc, 'train_loss': avg_loss, 'step': self.current_epoch}

        return {'loss': avg_loss, 'log': tensorboard_logs}

    def validation_step(self, batch, batch_idx):
        batch, y = batch
        y_hat = self(batch)

        loss = F.cross_entropy(y_hat, y.long())
        labels_hat = torch.argmax(y_hat, dim=1)
        n_correct_pred = torch.sum(y == labels_hat).item()

        return {'val_loss': loss, "n_correct_pred": n_correct_pred, "n_pred": len(y)}

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()

        val_acc = sum([x['n_correct_pred'] for x in outputs]) / sum(x['n_pred'] for x in outputs)
        tensorboard_logs = {'val_loss': avg_loss, 'val_acc': val_acc, 'step': self.current_epoch}

        return {'log': tensorboard_logs}

    def test_step(self, batch, batch_idx):
        batch, y = batch
        y_hat = self(batch)

        loss = F.cross_entropy(y_hat, y.long())
        labels_hat = torch.argmax(y_hat, dim=1)
        n_correct_pred = torch.sum(y == labels_hat).item()

        return {'test_loss': loss, "n_correct_pred": n_correct_pred, "n_pred": len(y)}

    def test_epoch_end(self, outputs):
        avg_loss = torch.stack([x['test_loss'] for x in outputs]).mean()
        test_acc = sum([x['n_correct_pred'] for x in outputs]) / sum(x['n_pred'] for x in outputs)
        tensorboard_logs = {'test_loss': avg_loss, 'test_acc': test_acc, 'step': self.current_epoch}

        return {'log': tensorboard_logs}

Here is what it looks like.

image

Cheers!

5reactions
mpaeppercommented, Jun 8, 2020

Thank you, that was a good hint. I debugged it now.

It’s possible to pass in the ‘step’ which will be used as the current_epoch like this:

    def training_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        tensorboard_logs = {'train/loss': avg_loss}
        for name in self.metrics:
            tensorboard_logs['train/{}'.format(name)] = torch.stack([x['metr'][name] for x in outputs]).mean()
        tensorboard_logs['step'] = self.current_epoch

        return {'loss': avg_loss, 'log': tensorboard_logs}
Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorboard logging by epoch instead of by step · Issue #2110
Short question concerning the tensorboard logging: I am using it like this: def training_epoch_end(self, outputs): avg_loss ...
Read more >
TensorBoard Scalars: Logging training metrics in Keras
Logging metrics at the batch level instantaneously can show us the level of fluctuation between batches while training in each epoch, which can ......
Read more >
Logging — PyTorch Lightning 1.8.5.post0 documentation
logger : Logs to the logger like Tensorboard , or any other custom logger passed to the Trainer (Default: True ). reduce_fx :...
Read more >
tensorflow - "Epochs" instead of "Steps" in the horizontal axis ...
I am using keras_model_to_estimator to train an estimator and use tensorboard for analysis. The estimator logs the steps and not epochs, ...
Read more >
Is there a way to only log on epoch end using the new Result ...
Is there a way to only log on epoch end using the new Result APIs? I was able to do this by forcing...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found