question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Resume training and plot

See original GitHub issue

Hello,

I have a run that is terminated in the middle of Training and now I want to resume the training. I didn’t set resume=True in wandb.init() but saved the model separately using PyTorch. IS it possible to load and resume the previous plot?

Thank you

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
vanpeltcommented, Dec 18, 2020

@kargarisaac to manually resume an existing run you should run wandb.init(id="YOUR_RUN_ID", project="YOUR_PROJECT", resume="must") then you can make calls to wandb.log that will append metrics to that run as documented here: https://docs.wandb.com/library/resuming

2reactions
vanpeltcommented, Jul 8, 2021

@nbortych the run_id is available in the url of the run itself, or from the overview page. I.E. https://wandb.ai/vanpelt/reproducibility/runs/3f87uku2/overview

You can find the run id in the last part of the “Run Path” attribute or from the url, 3f87uku2 in this case.

You can also access the id of the run programaticall in your script wandb.run.id

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorboard resume training plot - pytorch - Stack Overflow
I figured out how to continue the training plot. While creating the summarywriter, we need to provide the same log_dir that we used...
Read more >
Resume training and plot · Issue #1622 · wandb ... - GitHub
Hello,. I have a run that is terminated in the middle of Training and now I want to resume the training. I didn't...
Read more >
Saving and Loading Your Model to Resume Training in PyTorch
A simple PyTorch tutorial on how to resuming training deep learning models.
Read more >
Keras: Starting, stopping, and resuming training
To learn how to start, stop, and resume training with Keras, just keep reading! ... The training plot is overwritten upon each epoch...
Read more >
Resume Training from Checkpoint Network - MathWorks
This example shows how to save checkpoint networks while training a deep learning network and resume training from a previously saved network.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found