question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to resume training from last checkpoint?

See original GitHub issue

Hi,

How to resume training from a previous checkpoint on a custom dataset? I know that you need to change

trainer.resume_or_load(resume=True)

Besides this, what should be the values of cfg.merge_from_file and cfg.MODEL.WEIGHTS ?

Many thanks!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:16 (1 by maintainers)

github_iconTop GitHub Comments

10reactions
Keikucommented, Mar 15, 2020

The code of config of resume is as follows.

https://github.com/facebookresearch/detectron2/blob/662cbb71538fc5169dc2361f97ca0e4ed2961f75/detectron2/engine/defaults.py#L48

    parser.add_argument(
        "--resume",
        action="store_true",
        help="whether to attempt to resume from the checkpoint directory",
    )

The argparse action API doesn’t take anything as an argument, it simply gives a true flag. Therefore, if you execute the program as follows, resume training will be performed and the checkpoint described in last_checkpoint file of specified OUTPUT_DIR will be loaded. If last_checkpoint file does not exist, start normal training.

python tools/train_net.py \
    --config-file configs/PascalVOC-Detection/faster_rcnn_R_50_C4.yaml OUTPUT_DIR output \
    --resume
8reactions
ppwwyyxxcommented, Dec 1, 2020

python3.6 train_net.py --num-gpus 1 --config-file configs/R-50-grid.yaml OUTPUT_DIR output --resume

--resume has to be before “OUTPUT_DIR”

Read more comments on GitHub >

github_iconTop Results From Across the Web

how to resume training from last checkpoint? #3844 - GitHub
Hi, I have been training a model on my own dataset, How I can resume ... You can set the path of the...
Read more >
How to Pause / Resume Training in Tensorflow - Stack Overflow
Make sure you are saving your checkpoints. In tf.train. · Specify the directory of the checkpoints in the tf.train. · When you constantly...
Read more >
Resume Training from Checkpoint Network - MathWorks
This example shows how to save checkpoint networks while training a deep learning network and resume training from a previously saved network.
Read more >
How to load checkpoint and resume training - PyTorch-Ignite
This example demonstrates how you can save and load a checkpoint then resume training.
Read more >
Continuing Pre Training from Model Checkpoint
Hi,. I pre-trained a language model for my own data and I want to continue the pre-training for additional steps using the last...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found