WandB-Logger drops all the logged values in training step for PyTorch Lightning
See original GitHub issuetrain-step:
def training_step(self, batch, batch_idx):
x, y = batch
logits, masks = self(x)
loss = self.loss_func(logits, y)
self.log(f"train-loss", loss, prog_bar=True)
return {
"loss": loss,
"pred": torch.argmax(logits, -1),
"labels": y,
}
and on epoch end it logs the Precision and recall.
after the validation step. I get a wandb: WARNING Step must only increase in log calls. Step 379 < 380; dropping {train-loss: 0.494, 'train-precision': 0.9, 'train-recall': 1, 'epoch': 1}
WandB version: 0.10.10 PyTorch-Lightning: 1.0.6
Issue Analytics
- State:
- Created 3 years ago
- Comments:23 (13 by maintainers)
Top Results From Across the Web
WandB-Logger drops all the logged values in training step for ...
I have this issue too! PyTorch Lightning resets step to 0 at the start of a new epoch, but wandb expects step to...
Read more >WandbLogger — PyTorch Lightning 1.8.5.post0 documentation
A new W&B run will be created when training starts if you have not created one manually before with wandb.init() . Log metrics....
Read more >Pytorch_Lightning + Wandb Starter - Kaggle
It incorporates pytorch lightning and also includes a very useful logger called wandb. pytorch lightning was made with reference to the following notebook....
Read more >PyTorch Lightning - Documentation - Weights & Biases - WandB
PyTorch Lightning has a WandbLogger class that can be used to seamlessly log metrics, model weights, media and more. Just instantiate the WandbLogger...
Read more >Lightning 1.6: Habana Accelerator, Bagua Distributed, Fault ...
Learn more about what's new in PyTorch Lightning 1.6, ... Saved checkpoints that use the global step value as part of the filename...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Fixed it! I was calling
trainer.logger.experiment.log
in a callback to log Matplotlib images and apparently this increases the step unlesscommit
is set toFalse
, in which case it gets group with the next logging message sent via PyTorch Lightning. Subtle bug! Everything’s ok now and working as expected 😃 thanks for your help @borisdayma. Hope @IsCoelacanth was able to resolve their issue too!This was fixed with https://github.com/PyTorchLightning/pytorch-lightning/pull/5931. You can now just log at any time with
self.log('my_metric', my_value)
and you won’t have any dropped value. Just choose your x-axis appropriately in the UI (whetherglobal_step
or just the auto-incrementedstep
).