Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can I train the model based on the files in "checkpoints" dir

See original GitHub issue

I copy the files in checkpoints dir into ./output/ctpn_end2end/voc_2007_trainval/ , but when I run python3 ./ctpn/train_net.py. I came cross errors.

Traceback (most recent call last):
  File "./ctpn/train_net.py", line 37, in <module>
    restore=bool(int(cfg.TRAIN.restore)))
  File "/home/fant/projects/python/text-detection-ctpn-master/lib/fast_rcnn/train.py", line 233, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "/home/fant/projects/python/text-detection-ctpn-master/lib/fast_rcnn/train.py", line 187, in train_model
    if last_snapshot_iter != iter:
UnboundLocalError: local variable 'iter' referenced before assignment

How can I resolve it

Issue Analytics

State:
Created 6 years ago
Comments:5

Top GitHub Comments

1reaction

cipri-tomcommented, Feb 9, 2018

@FantDing the loss tells you almost nothing, unless it is diverging. Remember it is an average of many things. Try to evaluate the model and see how it performs.

Next week I’ll try to integrate an eval step during training, so you can pass a validation set and check the accuracy in TensorBoard.

1reaction

cipri-tomcommented, Feb 8, 2018

it depends on what you train on. The model in checkpoints was trained as detailed in the readme. If you want to continue training with your own images, I can imagine these are very different, so the model will be “confused” and is normal to have a high loss.

If you want to train on a different set of images, I suggest you train from scratch. If it is too slow for you, I know of a couple of optimisations which I implemented in my fork

Top Results From Across the Web

How to Checkpoint Deep Learning Models in Keras

A simpler checkpoint strategy is to save the model weights to the same file if and only if the validation accuracy improves. This...

Training checkpoints | TensorFlow Core

The persistent state of a TensorFlow model is stored in tf.Variable objects. These can be constructed directly, but are often created through high-level...

Checkpoints | Data Version Control · DVC

This guide covers how to implement checkpoints in an ML project using DVC. We're going to train a model to identify handwritten digits...

Checkpoints in Deep Learning - INDUSMIC

Checkpoints does not contain information about the model or the training nor do they contain any of the computations defined by the model,...

Use Checkpoints in Amazon SageMaker - AWS Documentation

Use checkpoints in Amazon SageMaker to save the state of models. Training can be resumed from the saved checkpoints.