Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wandb loses "global_step" in PyTorch event_writer

See original GitHub issue

There doesn’t seem to be a correspondence between global_step and step in my graphs. I’m using wandb through eventwriter API in PyTorch. Debugging a bit, it seems

step gets converted to global_step here https://github.com/wandb/client/blob/6417dd926abe76dbb7c56e7017d2ee7d1c918eb5/wandb/tensorboard/__init__.py#L205

Then wandb.log gets called with this dict, it doesn’t see step so it assigns one automatically

Custom global step is useful to compare data efficiency consistency across runs – using “forward calls” as x-axis means your curves look 2x better when doubling batch size or number of workers. Using global data counter for steps gives easier to interpret curves @vanpelt

Issue Analytics

State:
Created 4 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

3reactions

vanpeltcommented, Oct 17, 2019

Yeah, we look for any metrics that are monotonically increasing and make them available as x-axis options. MLFlow, Tensorboard, and others allow you to go back in time during training and we don’t. By putting the step in metrics we’re punting the missing feature by allowing users to log historic steps at the cost of using them as an x-axis (but still being able to export via the python ap). The big rewrite will enable this behaviour. We’re seeing a bunch of users syncing tensorboard, so we should atleast default to global_step as the x-axis in this case.

0reactions

yaroslavvbcommented, Oct 22, 2019

Thanks for the info, closing it as duplicate of https://github.com/wandb/client/issues/613