question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to train models?

See original GitHub issue

#13 Was helpful for using the pre-trained models. However, I would like to train my own models. As a first step, I removed all the existing models:

# cd trained_models/
# ls
conll_2003_en		     i2b2_2014_glove_stanford_bioes  mimic_glove_stanford_bioes
i2b2_2014_glove_spacy_bioes  mimic_glove_spacy_bioes	     performances.md
# rm -rf *

Then I trained a model using data/conll2003/en/ (identical data, default parameters.ini):

# python3 main.py --maximum_number_of_epochs=1 --token_pretrained_embedding_filepath=""

I then see a model in ../output/en_2017-07-26_22-06-40-510129/model. However, I get this error when trying to use the model:

# ls ../output/en_2017-07-26_22-06-40-510129/model/
checkpoint				     model_00001.ckpt.meta
dataset.pickle				     parameters.ini
events.out.tfevents.1501106802.114dd5c0c94c  projector_config.pbtxt
model_00001.ckpt.data-00000-of-00001	     tensorboard_metadata_characters.tsv
model_00001.ckpt.index			     tensorboard_metadata_tokens.tsv
# python3 main.py --train_model=False --use_pretrained_model=True --dataset_text_folder=../data/example_unannotated_texts --pretrained_model_folder=../output/en_2017-07-26_22-06-40-510129/model/
NeuroNER version: 1.0-dev
TensorFlow version: 1.2.1
{'character_embedding_dimension': 25,
 'character_lstm_hidden_state_dimension': 25,
 'check_for_digits_replaced_with_zeros': 1,
 'check_for_lowercase': 1,
 'dataset_text_folder': '../data/example_unannotated_texts',
 'debug': 0,
 'dropout_rate': 0.5,
 'experiment_name': 'test',
 'freeze_token_embeddings': 0,
 'gradient_clipping_value': 5.0,
 'learning_rate': 0.005,
 'load_all_pretrained_token_embeddings': 0,
 'load_only_pretrained_token_embeddings': 0,
 'main_evaluation_mode': 'conll',
 'maximum_number_of_epochs': 100,
 'number_of_cpu_threads': 8,
 'number_of_gpus': 0,
 'optimizer': 'sgd',
 'output_folder': '../output',
 'parameters_filepath': './parameters.ini',
 'patience': 10,
 'plot_format': 'pdf',
 'pretrained_model_folder': '../output/en_2017-07-26_22-06-40-510129/model/',
 'reload_character_embeddings': 1,
 'reload_character_lstm': 1,
 'reload_crf': 1,
 'reload_feedforward': 1,
 'reload_token_embeddings': 1,
 'reload_token_lstm': 1,
 'remap_unknown_tokens_to_unk': 1,
 'spacylanguage': 'en',
 'tagging_format': 'bioes',
 'token_embedding_dimension': 100,
 'token_lstm_hidden_state_dimension': 100,
 'token_pretrained_embedding_filepath': '../data/word_vectors/glove.6B.100d.txt',
 'tokenizer': 'spacy',
 'train_model': 0,
 'use_character_lstm': 1,
 'use_crf': 1,
 'use_pretrained_model': 1,
 'verbose': 0}
Formatting deploy set from BRAT to CONLL... Done.
Converting CONLL from BIO to BIOES format... Done.
Load dataset... done (19.13 seconds)
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../output/en_2017-07-26_22-06-40-510129/model/model.ckpt
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 250, in <module>
    main()
  File "main.py", line 245, in main
    nn = NeuroNER(**arguments)
  File "/NeuroNER/src/neuroner.py", line 285, in __init__
    self.transition_params_trained = model.restore_from_pretrained_model(parameters, dataset, sess, token_to_vector=token_to_vector)
  File "/NeuroNER/src/entity_lstm.py", line 337, in restore_from_pretrained_model
    self.saver.restore(sess, pretrained_model_checkpoint_filepath) # Works only when the dimensions of tensor variables are matched.
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1548, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../output/en_2017-07-26_22-06-40-510129/model/model.ckpt
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]

Caused by op 'save/RestoreV2_23', defined at:
  File "main.py", line 250, in <module>
    main()
  File "main.py", line 245, in main
    nn = NeuroNER(**arguments)
  File "/NeuroNER/src/neuroner.py", line 278, in __init__
    model = EntityLSTM(dataset, parameters)
  File "/NeuroNER/src/entity_lstm.py", line 216, in __init__
    self.saver = tf.train.Saver(max_to_keep=parameters['maximum_number_of_epochs'])  # defaults to saving all variables
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 640, in restore_v2
    dtypes=dtypes, name=name)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../output/en_2017-07-26_22-06-40-510129/model/model.ckpt
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]

Exception ignored in: <bound method NeuroNER.__del__ of <neuroner.NeuroNER object at 0x7f9eb30e7f60>>
Traceback (most recent call last):
  File "/NeuroNER/src/neuroner.py", line 489, in __del__
    self.sess.close()
AttributeError: 'NeuroNER' object has no attribute 'sess'

I’m not sure what to make of this, since I don’t see the need for a checkpoint file when using the pretrained models, and there is no model.ckpt in the pre-trained models (although I did see model.ckpt.index, model.ckpt.data-00000-of-00001, and model.ckpt.meta).

Note: I am using python 3.6. I assumed that this would not break compatibility with 3.5 and have yet to test 3.5.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5

github_iconTop GitHub Comments

4reactions
jennyjyleecommented, Jul 27, 2017

In order to load the pretrained model, NeuroNER requires the files parameters.ini, dataset.pickle, model.ckpt.index, model.ckpt.data-00000-of-00001 and model.ckpt.meta to be present in the specified pretrained_model_folder. The latter three files are how tensorflow saves a checkpoint, though the header in the parameters.ini file refers to those files altogether as model.ckpt for simplicity.

Your example should run properly if you simply rename the three files model_00001.ckpt.index, model_00001.ckpt.data-00000-of-00001, and model_00001.ckpt.meta to model.ckpt.index, model.ckpt.data-00000-of-00001 and model.ckpt.meta by removing ‘_00001’ from the filenames.

In order to export the model more cleanly, please refer to the Sharing a pretrained model section in README.md. The src/prepare_pretrained_model.py script should take care of renaming and copying over the necessary files to a new folder.

1reaction
JDongiancommented, Jul 27, 2017

Thank you Jenny, this advice resolved the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to train a Machine Learning model in 5 minutes
Training models on Mateverse is just a 5 steps process. There is no need to learn even the coding skills, let alone the...
Read more >
How to train a learning model - Pluralsight
3 steps to training a machine learning model · Step 1: Begin with existing data · Step 2: Analyze data to identify patterns...
Read more >
Training ML Models - Amazon Machine Learning
Training ML Models ... The process of training an ML model involves providing an ML algorithm (that is, the learning algorithm) with training...
Read more >
How to Train Your Model: A novice's guide to selecting the ...
Logistic regression models are simple to use, and work well with datasets featuring non-complex relationships. They scale well for small and ...
Read more >
Step 4: Build, Train, and Evaluate Your Model
Training involves making a prediction based on the current state of the model, calculating how incorrect the prediction is, and updating the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found