Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tensorboard logging by epoch instead of by step

See original GitHub issue

Short question concerning the tensorboard logging:

I am using it like this:

    def training_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        tensorboard_logs = {'train/loss': avg_loss}
        for name in self.metrics:
            tensorboard_logs['train/{}'.format(name)] = torch.stack([x['metr'][name] for x in outputs]).mean()

        return {'loss': avg_loss, 'log': tensorboard_logs}

It works very well, but in the plots (the x-axis) is the step, so each batch is a step. Is it possible to have the x-axis be the epoch as I want to plot the metrics only per epoch?

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:17 (7 by maintainers)

Top GitHub Comments

20reactions

adeboissierecommented, Jun 11, 2020

Hi,

It is possible to track both the steps and epochs using tensorboard. Here is an example. It is quite straightforwd.

    def training_step(self, batch, batch_idx):
        batch, y = batch
        y_hat = self(batch)

        labels_hat = torch.argmax(y_hat, dim=1)
        n_correct_pred = torch.sum(y == labels_hat).item()

        loss = F.cross_entropy(y_hat, y.long())
        tensorboard_logs = {'train_acc_step': n_correct_pred, 'train_loss_step': loss}

        return {'loss': loss, "n_correct_pred": n_correct_pred, "n_pred": len(y), 'log': tensorboard_logs}

    def training_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()

        train_acc = sum([x['n_correct_pred'] for x in outputs]) / sum(x['n_pred'] for x in outputs)
        tensorboard_logs = {'train_acc': train_acc, 'train_loss': avg_loss, 'step': self.current_epoch}

        return {'loss': avg_loss, 'log': tensorboard_logs}

    def validation_step(self, batch, batch_idx):
        batch, y = batch
        y_hat = self(batch)

        loss = F.cross_entropy(y_hat, y.long())
        labels_hat = torch.argmax(y_hat, dim=1)
        n_correct_pred = torch.sum(y == labels_hat).item()

        return {'val_loss': loss, "n_correct_pred": n_correct_pred, "n_pred": len(y)}

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()

        val_acc = sum([x['n_correct_pred'] for x in outputs]) / sum(x['n_pred'] for x in outputs)
        tensorboard_logs = {'val_loss': avg_loss, 'val_acc': val_acc, 'step': self.current_epoch}

        return {'log': tensorboard_logs}

    def test_step(self, batch, batch_idx):
        batch, y = batch
        y_hat = self(batch)

        loss = F.cross_entropy(y_hat, y.long())
        labels_hat = torch.argmax(y_hat, dim=1)
        n_correct_pred = torch.sum(y == labels_hat).item()

        return {'test_loss': loss, "n_correct_pred": n_correct_pred, "n_pred": len(y)}

    def test_epoch_end(self, outputs):
        avg_loss = torch.stack([x['test_loss'] for x in outputs]).mean()
        test_acc = sum([x['n_correct_pred'] for x in outputs]) / sum(x['n_pred'] for x in outputs)
        tensorboard_logs = {'test_loss': avg_loss, 'test_acc': test_acc, 'step': self.current_epoch}

        return {'log': tensorboard_logs}

Here is what it looks like.

Cheers!

5reactions

mpaeppercommented, Jun 8, 2020

Thank you, that was a good hint. I debugged it now.

It’s possible to pass in the ‘step’ which will be used as the current_epoch like this:

    def training_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        tensorboard_logs = {'train/loss': avg_loss}
        for name in self.metrics:
            tensorboard_logs['train/{}'.format(name)] = torch.stack([x['metr'][name] for x in outputs]).mean()
        tensorboard_logs['step'] = self.current_epoch

        return {'loss': avg_loss, 'log': tensorboard_logs}

Top Results From Across the Web

Tensorboard logging by epoch instead of by step · Issue #2110

Short question concerning the tensorboard logging: I am using it like this: def training_epoch_end(self, outputs): avg_loss ...

TensorBoard Scalars: Logging training metrics in Keras

Logging metrics at the batch level instantaneously can show us the level of fluctuation between batches while training in each epoch, which can ......

Logging — PyTorch Lightning 1.8.5.post0 documentation

logger : Logs to the logger like Tensorboard , or any other custom logger passed to the Trainer (Default: True ). reduce_fx :...

tensorflow - "Epochs" instead of "Steps" in the horizontal axis ...

I am using keras_model_to_estimator to train an estimator and use tensorboard for analysis. The estimator logs the steps and not epochs, ...

Is there a way to only log on epoch end using the new Result ...

Is there a way to only log on epoch end using the new Result APIs? I was able to do this by forcing...