Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WandB-Logger drops all the logged values in training step for PyTorch Lightning

See original GitHub issue

train-step:

def training_step(self, batch, batch_idx):
            x, y = batch
            logits, masks = self(x)
            loss = self.loss_func(logits, y)
            self.log(f"train-loss", loss, prog_bar=True)
            return {
                "loss": loss,
                "pred": torch.argmax(logits, -1),
                "labels": y,
            }

and on epoch end it logs the Precision and recall. after the validation step. I get a wandb: WARNING Step must only increase in log calls. Step 379 < 380; dropping {train-loss: 0.494, 'train-precision': 0.9, 'train-recall': 1, 'epoch': 1}

WandB version: 0.10.10 PyTorch-Lightning: 1.0.6

Issue Analytics

State:
Created 3 years ago
Comments:23 (13 by maintainers)

Top GitHub Comments

9reactions

isaacrobcommented, Nov 20, 2020

Fixed it! I was calling trainer.logger.experiment.log in a callback to log Matplotlib images and apparently this increases the step unless commit is set to False, in which case it gets group with the next logging message sent via PyTorch Lightning. Subtle bug! Everything’s ok now and working as expected 😃 thanks for your help @borisdayma. Hope @IsCoelacanth was able to resolve their issue too!

3reactions

borisdaymacommented, Mar 8, 2021

This was fixed with https://github.com/PyTorchLightning/pytorch-lightning/pull/5931. You can now just log at any time with self.log('my_metric', my_value) and you won’t have any dropped value. Just choose your x-axis appropriately in the UI (whether global_step or just the auto-incremented step).

Top Results From Across the Web

WandB-Logger drops all the logged values in training step for ...

I have this issue too! PyTorch Lightning resets step to 0 at the start of a new epoch, but wandb expects step to...

WandbLogger — PyTorch Lightning 1.8.5.post0 documentation

A new W&B run will be created when training starts if you have not created one manually before with wandb.init() . Log metrics....

Pytorch_Lightning + Wandb Starter - Kaggle

It incorporates pytorch lightning and also includes a very useful logger called wandb. pytorch lightning was made with reference to the following notebook....

PyTorch Lightning - Documentation - Weights & Biases - WandB

PyTorch Lightning has a WandbLogger class that can be used to seamlessly log metrics, model weights, media and more. Just instantiate the WandbLogger...

Lightning 1.6: Habana Accelerator, Bagua Distributed, Fault ...

Learn more about what's new in PyTorch Lightning 1.6, ... Saved checkpoints that use the global step value as part of the filename...