When I try to restore weights from checkpoint, following Score2perf's README, checkpoint doesn't have some keys
See original GitHub issueUbuntu 18.04 Python 3.7.9 Tensorflow 2.3.1
When I follow https://github.com/magenta/magenta/blob/master/magenta/models/score2perf/README.md, checkpoint doesn’t have some keys The problem happens when I follow Training and Sampling from the model part.
This issue is actually very similar to https://github.com/magenta/magenta/issues/1647.
Now in score2perf code imports tf.compat.v1 as tf
, so it doesn’t matter that I use tensorflow 2.
The Training command is like below at the README.
DATA_DIR=/generated/tfrecords/dir
HPARAMS_SET=score2perf_transformer_base
MODEL=transformer
PROBLEM=score2perf_maestro_language_uncropped_aug
TRAIN_DIR=/training/dir
HPARAMS=\
"label_smoothing=0.0,"\
"max_length=0,"\
"max_target_seq_length=2048"
t2t_trainer \
--data_dir="${DATA_DIR}" \
--hparams=${HPARAMS} \
--hparams_set=${HPARAMS_SET} \
--model=${MODEL} \
--output_dir=${TRAIN_DIR} \
--problem=${PROBLEM} \
--train_steps=1000000
when I do as what Training at README says, I got this error, after training 1000 epoches, and the python file tries to load from 1000 epoch checkpoint and to evaluaiton.
Not found: Key transformer/parallel_0_3/transformer/transformer/body/decoder/layer_0/self_attention/multihead_attention/k/kernel not found in checkpoint
[[node save/RestoreV2_1 (defined at /.pyenv/versions/3.7.9/envs/tensor2tensor/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:629) ]]
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
However, When I run the command at Training again, surprisingly, it succeeds to load from checkpoint and train from 1000 epoch, and save 2000 epoch weights. But then again, when it loads from 2000 epoch checkpoint and try to do evaluaiton, it fails.
For Inference (Sampling from the model), it just fails.
Anyone could help? Thanks in advance.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5
Top GitHub Comments
Hi. I managed to fix the problem for myself by cloning the latest tensor2tensor master branch and installing it by using
pip install -e .
in the local tensor2tensor repository.Hello. I also have the same problem. Are there any solutions yet?