Problem when Converting a Fine-tuned Checkpoint from TF to PyTorch using ALBERTxxlargev1 Model
See original GitHub issueš Bug
Information
Model I am using : ALBERTxxlargeV1
Language I am using the model on : English
The problem arises when using: Converting fine-tuned checkpoint from TF to PyTorch. No Problem with converting pre-trained checkpoints from TF.
- the official example scripts:
!python /content/transformers/src/transformers/convert_albert_original_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path /content/pretrained_models/albertsquad/model.ckpt-best \
--albert_config_file /content/pretrained_models/albertsquad/config.json \
--pytorch_dump_path /content/pretrained_models/albertsquad/pytorch_model.bin
My vocabulary model was also placed on the same folder with the name āspiece.modelā along with model.ckpt-best.index and model.ckpt-best.meta
I think the problem resides here https://github.com/huggingface/transformers/blob/352d5472b0c1dec0f420d606d16747d851b4bda8/src/transformers/modeling_albert.py#L120 and here https://github.com/huggingface/transformers/blob/352d5472b0c1dec0f420d606d16747d851b4bda8/src/transformers/modeling_albert.py#L160 or to replace names in the structure of TF in lines around 70 in modeling_albert.py The tasks I am working on is:
- an official GLUE/SQUaD task: SQUAD
- my own task or dataset: not related
To reproduce
Steps to reproduce the behavior:
- Pre-train ALBERTxx large model using v1 configuration on TF and then fine-tune it on GLUE or SQUAD Task using TF, not PyTorch.
- Copy TF checkpoint on a folder along with the sentence piece model as āspiece.modelā and config file as āconfig.jsonā
- Try to convert TF checkpoint to PyTorch and you will have this message
2020-04-13 21:26:33.470832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Building PyTorch model from configuration: AlbertConfig {
"_num_labels": 2,
"architectures": null,
"attention_probs_dropout_prob": 0,
"bad_words_ids": null,
"bos_token_id": 2,
"classifier_dropout_prob": 0.1,
"decoder_start_token_id": null,
"do_sample": false,
"down_scale_factor": 1,
"early_stopping": false,
"embedding_size": 128,
"eos_token_id": 3,
"finetuning_task": null,
"gap_size": 0,
"hidden_act": "gelu",
"hidden_dropout_prob": 0,
"hidden_size": 4096,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.01,
"inner_group_num": 1,
"intermediate_size": 16384,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"layers_to_keep": [],
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"min_length": 0,
"model_type": "albert",
"net_structure_type": 0,
"no_repeat_ngram_size": 0,
"num_attention_heads": 64,
"num_beams": 1,
"num_hidden_groups": 1,
"num_hidden_layers": 12,
"num_memory_blocks": 0,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_token_id": 0,
"prefix": null,
"pruned_heads": {},
"repetition_penalty": 1.0,
"task_specific_params": null,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"type_vocab_size": 2,
"use_bfloat16": false,
"vocab_size": 30000,
"xla_device": null
}
INFO:transformers.modeling_albert:Converting TensorFlow checkpoint from /content/pretrained_models/albertCOVIDglue/model.ckpt-best
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/beta with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/beta/adam_m with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/beta/adam_v with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/gamma with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/gamma/adam_m with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/gamma/adam_v with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/position_embeddings with shape [512, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/position_embeddings/adam_m with shape [512, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/position_embeddings/adam_v with shape [512, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/token_type_embeddings with shape [2, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/token_type_embeddings/adam_m with shape [2, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/token_type_embeddings/adam_v with shape [2, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/word_embeddings with shape [30000, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/word_embeddings/adam_m with shape [30000, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/word_embeddings/adam_v with shape [30000, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel with shape [128, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel/adam_m with shape [128, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel/adam_v with shape [128, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias with shape [16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m with shape [16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v with shape [16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel with shape [4096, 16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m with shape [4096, 16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v with shape [4096, 16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel with shape [16384, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m with shape [16384, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v with shape [16384, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight global_step with shape []
INFO:transformers.modeling_albert:Loading TF weight output_bias with shape [3]
INFO:transformers.modeling_albert:Loading TF weight output_bias/adam_m with shape [3]
INFO:transformers.modeling_albert:Loading TF weight output_bias/adam_v with shape [3]
INFO:transformers.modeling_albert:Loading TF weight output_weights with shape [3, 4096]
INFO:transformers.modeling_albert:Loading TF weight output_weights/adam_m with shape [3, 4096]
INFO:transformers.modeling_albert:Loading TF weight output_weights/adam_v with shape [3, 4096]
bert/embeddings/LayerNorm/beta
bert/embeddings/LayerNorm/beta/adam_m
bert/embeddings/LayerNorm/beta/adam_v
bert/embeddings/LayerNorm/gamma
bert/embeddings/LayerNorm/gamma/adam_m
bert/embeddings/LayerNorm/gamma/adam_v
bert/embeddings/position_embeddings
bert/embeddings/position_embeddings/adam_m
bert/embeddings/position_embeddings/adam_v
bert/embeddings/token_type_embeddings
bert/embeddings/token_type_embeddings/adam_m
bert/embeddings/token_type_embeddings/adam_v
bert/embeddings/word_embeddings
bert/embeddings/word_embeddings/adam_m
bert/embeddings/word_embeddings/adam_v
bert/encoder/embedding_hidden_mapping_in/bias
bert/encoder/embedding_hidden_mapping_in/bias/adam_m
bert/encoder/embedding_hidden_mapping_in/bias/adam_v
bert/encoder/embedding_hidden_mapping_in/kernel
bert/encoder/embedding_hidden_mapping_in/kernel/adam_m
bert/encoder/embedding_hidden_mapping_in/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
global_step
output_bias
output_bias/adam_m
output_bias/adam_v
output_weights
output_weights/adam_m
output_weights/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/position_embeddings/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/position_embeddings/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'token_type_embeddings'] from bert/embeddings/token_type_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/token_type_embeddings/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/token_type_embeddings/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'word_embeddings'] from bert/embeddings/word_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/word_embeddings/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/word_embeddings/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'embedding_hidden_mapping_in', 'bias'] from bert/encoder/embedding_hidden_mapping_in/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'embedding_hidden_mapping_in', 'kernel'] from bert/encoder/embedding_hidden_mapping_in/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'LayerNorm', 'beta'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'LayerNorm', 'gamma'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'full_layer_layer_norm', 'beta'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/beta/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'full_layer_layer_norm', 'gamma'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/gamma/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'dense', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'dense', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'key', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'key', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'query', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'query', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'value', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'value', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn_output', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn_output', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/kernel/adam_v
Initialize PyTorch weight ['albert', 'pooler', 'bias'] from bert/pooler/dense/bias
INFO:transformers.modeling_albert:Skipping albert/pooler/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/pooler/bias/adam_v
Initialize PyTorch weight ['albert', 'pooler', 'kernel'] from bert/pooler/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/pooler/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/pooler/kernel/adam_v
INFO:transformers.modeling_albert:Skipping global_step
INFO:transformers.modeling_albert:Skipping classifier/output_bias
Traceback (most recent call last):
File "/content/transformers/src/transformers/convert_albert_original_tf_checkpoint_to_pytorch.py", line 61, in <module>
convert_tf_checkpoint_to_pytorch(args.tf_checkpoint_path, args.albert_config_file, args.pytorch_dump_path)
File "/content/transformers/src/transformers/convert_albert_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_albert(model, config, tf_checkpoint_path)
File "/content/drive/My Drive/transformers/src/transformers/modeling_albert.py", line 140, in load_tf_weights_in_albert
pointer = getattr(pointer, "bias")
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
type(self).__name__, name))
AttributeError: 'AlbertForMaskedLM' object has no attribute 'bias'
I totally understand since I am using a fine-tuned model I should use use AlbertForSequenceClassification class or AlbertForQuestionAnswering instead of AlbertForMaskedLM which actually I tried and nothing changed. below is the message error that I got :
2020-04-13 21:29:01.166679: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Building PyTorch model from configuration: AlbertConfig {
"_num_labels": 2,
"architectures": null,
"attention_probs_dropout_prob": 0,
"bad_words_ids": null,
"bos_token_id": 2,
"classifier_dropout_prob": 0.1,
"decoder_start_token_id": null,
"do_sample": false,
"down_scale_factor": 1,
"early_stopping": false,
"embedding_size": 128,
"eos_token_id": 3,
"finetuning_task": null,
"gap_size": 0,
"hidden_act": "gelu",
"hidden_dropout_prob": 0,
"hidden_size": 4096,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.01,
"inner_group_num": 1,
"intermediate_size": 16384,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"layers_to_keep": [],
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"min_length": 0,
"model_type": "albert",
"net_structure_type": 0,
"no_repeat_ngram_size": 0,
"num_attention_heads": 64,
"num_beams": 1,
"num_hidden_groups": 1,
"num_hidden_layers": 12,
"num_memory_blocks": 0,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_token_id": 0,
"prefix": null,
"pruned_heads": {},
"repetition_penalty": 1.0,
"task_specific_params": null,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"type_vocab_size": 2,
"use_bfloat16": false,
"vocab_size": 30000,
"xla_device": null
}
INFO:transformers.modeling_albert:Converting TensorFlow checkpoint from /content/pretrained_models/albertCOVIDglue/model.ckpt-best
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/beta with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/beta/adam_m with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/beta/adam_v with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/gamma with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/gamma/adam_m with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/LayerNorm/gamma/adam_v with shape [128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/position_embeddings with shape [512, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/position_embeddings/adam_m with shape [512, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/position_embeddings/adam_v with shape [512, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/token_type_embeddings with shape [2, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/token_type_embeddings/adam_m with shape [2, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/token_type_embeddings/adam_v with shape [2, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/word_embeddings with shape [30000, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/word_embeddings/adam_m with shape [30000, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/embeddings/word_embeddings/adam_v with shape [30000, 128]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel with shape [128, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel/adam_m with shape [128, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel/adam_v with shape [128, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias with shape [16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m with shape [16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v with shape [16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel with shape [4096, 16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m with shape [4096, 16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v with shape [4096, 16384]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel with shape [16384, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m with shape [16384, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v with shape [16384, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/bias with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/bias/adam_m with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/bias/adam_v with shape [4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/kernel with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/kernel/adam_m with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight bert/pooler/dense/kernel/adam_v with shape [4096, 4096]
INFO:transformers.modeling_albert:Loading TF weight global_step with shape []
INFO:transformers.modeling_albert:Loading TF weight output_bias with shape [3]
INFO:transformers.modeling_albert:Loading TF weight output_bias/adam_m with shape [3]
INFO:transformers.modeling_albert:Loading TF weight output_bias/adam_v with shape [3]
INFO:transformers.modeling_albert:Loading TF weight output_weights with shape [3, 4096]
INFO:transformers.modeling_albert:Loading TF weight output_weights/adam_m with shape [3, 4096]
INFO:transformers.modeling_albert:Loading TF weight output_weights/adam_v with shape [3, 4096]
bert/embeddings/LayerNorm/beta
bert/embeddings/LayerNorm/beta/adam_m
bert/embeddings/LayerNorm/beta/adam_v
bert/embeddings/LayerNorm/gamma
bert/embeddings/LayerNorm/gamma/adam_m
bert/embeddings/LayerNorm/gamma/adam_v
bert/embeddings/position_embeddings
bert/embeddings/position_embeddings/adam_m
bert/embeddings/position_embeddings/adam_v
bert/embeddings/token_type_embeddings
bert/embeddings/token_type_embeddings/adam_m
bert/embeddings/token_type_embeddings/adam_v
bert/embeddings/word_embeddings
bert/embeddings/word_embeddings/adam_m
bert/embeddings/word_embeddings/adam_v
bert/encoder/embedding_hidden_mapping_in/bias
bert/encoder/embedding_hidden_mapping_in/bias/adam_m
bert/encoder/embedding_hidden_mapping_in/bias/adam_v
bert/encoder/embedding_hidden_mapping_in/kernel
bert/encoder/embedding_hidden_mapping_in/kernel/adam_m
bert/encoder/embedding_hidden_mapping_in/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
global_step
output_bias
output_bias/adam_m
output_bias/adam_v
output_weights
output_weights/adam_m
output_weights/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/position_embeddings/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/position_embeddings/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'token_type_embeddings'] from bert/embeddings/token_type_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/token_type_embeddings/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/token_type_embeddings/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'word_embeddings'] from bert/embeddings/word_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/word_embeddings/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/word_embeddings/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'embedding_hidden_mapping_in', 'bias'] from bert/encoder/embedding_hidden_mapping_in/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'embedding_hidden_mapping_in', 'kernel'] from bert/encoder/embedding_hidden_mapping_in/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/embedding_hidden_mapping_in/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'LayerNorm', 'beta'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'LayerNorm', 'gamma'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'full_layer_layer_norm', 'beta'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/beta/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'full_layer_layer_norm', 'gamma'] from bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/full_layer_layer_norm/gamma/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'dense', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'dense', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/dense/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'key', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'key', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/key/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'query', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'query', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/query/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'value', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'attention', 'value', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/attention/value/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn/kernel/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn_output', 'bias'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/bias/adam_v
Initialize PyTorch weight ['albert', 'encoder', 'albert_layer_groups', '0', 'albert_layers', '0', 'ffn_output', 'kernel'] from bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/encoder/albert_layer_groups/0/albert_layers/0/ffn_output/kernel/adam_v
Initialize PyTorch weight ['albert', 'pooler', 'bias'] from bert/pooler/dense/bias
INFO:transformers.modeling_albert:Skipping albert/pooler/bias/adam_m
INFO:transformers.modeling_albert:Skipping albert/pooler/bias/adam_v
Initialize PyTorch weight ['albert', 'pooler', 'kernel'] from bert/pooler/dense/kernel
INFO:transformers.modeling_albert:Skipping albert/pooler/kernel/adam_m
INFO:transformers.modeling_albert:Skipping albert/pooler/kernel/adam_v
INFO:transformers.modeling_albert:Skipping global_step
INFO:transformers.modeling_albert:Skipping classifier/output_bias
Traceback (most recent call last):
File "/content/transformers/src/transformers/convert_albert_original_tf_checkpoint_to_pytorch.py", line 61, in <module>
convert_tf_checkpoint_to_pytorch(args.tf_checkpoint_path, args.albert_config_file, args.pytorch_dump_path)
File "/content/transformers/src/transformers/convert_albert_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_albert(model, config, tf_checkpoint_path)
File "/content/drive/My Drive/transformers/src/transformers/modeling_albert.py", line 140, in load_tf_weights_in_albert
pointer = getattr(pointer, "bias")
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
type(self).__name__, name))
AttributeError: 'AlbertForQuestionAnswering' object has no attribute 'bias'
Expected behavior
This behavior only happen with a fine-tuned model on SQUAD or GLUE. I know and I managed to convert TF checkpoint without being fine-tuned on TF unit and they work fine. However, if I fine-tune my model using TF on SQUAD, then I canāt convert the checkpoint.
Environment info
Google Colab
transformers
version: latest- Platform: Google Colab
- Python version: 3.6.9
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
This problem has not been fixed for a long time. please have a look at this post: https://github.com/huggingface/transformers/issues/2006
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (2 by maintainers)
Top GitHub Comments
our hero LysandreJik assigned this problem to himself. Letās have confidence in him to solve it (:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.