[RAG] Expected RAG output after fine tuning
See original GitHub issueHi there.
Perhaps the following isn’t even a real issue, but I’m a bit confused with the current outputs I got.
I’m trying to fine tune RAG on a bunch of question-answer pairs I have (for while, not that much, < 1k ones). I have splitted them as suggested (train.source, train.target, val.source…). After running the finetune_rag.py
, the outputs generated were only two files (~2 kB):
- git_log.json
- hparams.pkl
Is that right? Because I was expecting a big binary file or something like that containing the weight matrices, so I could use them afterwards in a new trial.
Could you please tell me what’s the point I’m missing here?
I provide more details below. Btw, I have two NVIDIA RTX 3090, 24GB each, but they were barely used in the whole process (which took ~3 hours).
Command:
python finetune_rag.py \
--data_dir rag_manual_qa_finetuning \
--output_dir output_ft \
--model_name_or_path rag-sequence-base \
--model_type rag_sequence \
--gpus 2 \
--distributed_retriever pytorch
Logs (in fact, it’s strange but the logs even seem to be generated in duplicate - I don’t know why):
loading configuration file rag-sequence-base/config.json
Model config RagConfig {
"architectures": [
"RagSequenceForGeneration"
],
"dataset": "wiki_dpr",
"dataset_split": "train",
"do_deduplication": true,
"do_marginalize": false,
"doc_sep": " // ",
"exclude_bos_score": false,
"forced_eos_token_id": 2,
"generator": {
"_name_or_path": "",
"_num_labels": 3,
"activation_dropout": 0.0,
"activation_function": "gelu",
"add_bias_logits": false,
"add_cross_attention": false,
"add_final_layer_norm": false,
"architectures": [
"BartModel",
"BartForMaskedLM",
"BartForSequenceClassification"
],
"attention_dropout": 0.0,
"bad_words_ids": null,
"bos_token_id": 0,
"chunk_size_feed_forward": 0,
"classif_dropout": 0.0,
"classifier_dropout": 0.0,
"d_model": 1024,
"decoder_attention_heads": 16,
"decoder_ffn_dim": 4096,
"decoder_layerdrop": 0.0,
"decoder_layers": 12,
"decoder_start_token_id": 2,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.1,
"early_stopping": false,
"encoder_attention_heads": 16,
"encoder_ffn_dim": 4096,
"encoder_layerdrop": 0.0,
"encoder_layers": 12,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"extra_pos_embeddings": 2,
"finetuning_task": null,
"force_bos_token_to_be_generated": false,
"forced_bos_token_id": null,
"forced_eos_token_id": 2,
"gradient_checkpointing": false,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1",
"2": "LABEL_2"
},
"init_std": 0.02,
"is_decoder": false,
"is_encoder_decoder": true,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1,
"LABEL_2": 2
},
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 1024,
"min_length": 0,
"model_type": "bart",
"no_repeat_ngram_size": 0,
"normalize_before": false,
"normalize_embedding": true,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": false,
"output_scores": false,
"pad_token_id": 1,
"prefix": " ",
"pruned_heads": {},
"repetition_penalty": 1.0,
"return_dict": false,
"return_dict_in_generate": false,
"scale_embedding": false,
"sep_token_id": null,
"static_position_embeddings": false,
"task_specific_params": {
"summarization": {
"early_stopping": true,
"length_penalty": 2.0,
"max_length": 142,
"min_length": 56,
"no_repeat_ngram_size": 3,
"num_beams": 4
}
},
"temperature": 1.0,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"transformers_version": "4.4.0.dev0",
"use_bfloat16": false,
"use_cache": true,
"vocab_size": 50265
},
"index_name": "exact",
"index_path": null,
"is_encoder_decoder": true,
"label_smoothing": 0.0,
"max_combined_length": 300,
"model_type": "rag",
"n_docs": 5,
"output_retrieved": false,
"passages_path": null,
"question_encoder": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": [
"DPRQuestionEncoder"
],
"attention_probs_dropout_prob": 0.1,
"bad_words_ids": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"min_length": 0,
"model_type": "dpr",
"no_repeat_ngram_size": 0,
"num_attention_heads": 12,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"prefix": null,
"projection_dim": 0,
"pruned_heads": {},
"repetition_penalty": 1.0,
"return_dict": false,
"return_dict_in_generate": false,
"sep_token_id": null,
"task_specific_params": null,
"temperature": 1.0,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"transformers_version": "4.4.0.dev0",
"type_vocab_size": 2,
"use_bfloat16": false,
"use_cache": true,
"vocab_size": 30522
},
"reduce_loss": false,
"retrieval_batch_size": 8,
"retrieval_vector_size": 768,
"title_sep": " / ",
"use_cache": true,
"use_dummy_dataset": false,
"vocab_size": null
}
Model name 'rag-sequence-base' not found in model shortcut name list (facebook/dpr-question_encoder-single-nq-base, facebook/dpr-question_encoder-multiset-base). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/question_encoder_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/question_encoder_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/question_encoder_tokenizer/vocab.txt
loading file None
loading file None
loading file rag-sequence-base/question_encoder_tokenizer/special_tokens_map.json
loading file rag-sequence-base/question_encoder_tokenizer/tokenizer_config.json
Model name 'rag-sequence-base' not found in model shortcut name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/generator_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/generator_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/generator_tokenizer/vocab.json
loading file rag-sequence-base/generator_tokenizer/merges.txt
loading file None
loading file None
loading file rag-sequence-base/generator_tokenizer/special_tokens_map.json
loading file rag-sequence-base/generator_tokenizer/tokenizer_config.json
Loading passages from wiki_dpr
Downloading: 9.64kB [00:00, 10.8MB/s]
Downloading: 67.5kB [00:00, 59.5MB/s]
WARNING:datasets.builder:Using custom data configuration psgs_w100.nq.no_index-dummy=False,with_index=False
Downloading and preparing dataset wiki_dpr/psgs_w100.nq.no_index (download: 66.09 GiB, generated: 73.03 GiB, post-processed: Unknown size, total: 139.13 GiB) to /home/usp/.cache/huggingface/datasets/wiki_dpr/psgs_w100.nq.no_index-dummy=False,with_index=False/0.0.0/91b145e64f5bc8b55a7b3e9f730786ad6eb19cd5bc020e2e02cdf7d0cb9db9c1...
Downloading: 100%|█████████████████████████| 4.69G/4.69G [07:11<00:00, 10.9MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:27<00:00, 9.00MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:36<00:00, 8.47MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:37<00:00, 8.41MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:38<00:00, 8.36MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:40<00:00, 8.25MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:58<00:00, 7.45MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:58<00:00, 7.43MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:00<00:00, 7.34MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:04<00:00, 7.17MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:05<00:00, 7.13MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:07<00:00, 7.06MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:10<00:00, 6.94MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:24<00:00, 6.48MB/s]
Downloading: 100%|█████████████████████████| 1.32G/1.32G [03:27<00:00, 6.38MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:33<00:00, 6.21MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [04:57<00:00, 4.45MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:36<00:00, 8.47MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.94MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:44<00:00, 8.03MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:55<00:00, 7.54MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.92MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.90MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:56<00:00, 7.49MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:19<00:00, 6.63MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:53<00:00, 7.63MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:00<00:00, 7.33MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:11<00:00, 6.92MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:14<00:00, 6.80MB/s]
Downloading: 100%|█████████████████████████| 1.32G/1.32G [03:06<00:00, 7.10MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:35<00:00, 6.16MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:50<00:00, 5.76MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.93MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:32<00:00, 8.67MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:07<00:00, 7.05MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:53<00:00, 7.62MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:22<00:00, 6.56MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:47<00:00, 7.93MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:26<00:00, 9.06MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:40<00:00, 8.25MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:42<00:00, 8.17MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:54<00:00, 7.59MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:41<00:00, 8.22MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:18<00:00, 6.69MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:30<00:00, 8.83MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:00<00:00, 7.34MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:20<00:00, 9.44MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:24<00:00, 9.19MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:21<00:00, 9.38MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:18<00:00, 9.59MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:19<00:00, 9.53MB/s]
0 examples [00:00, ? examples/s]2021-03-05 12:11:39.666323: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Dataset wiki_dpr downloaded and prepared to /home/usp/.cache/huggingface/datasets/wiki_dpr/psgs_w100.nq.no_index-dummy=False,with_index=False/0.0.0/91b145e64f5bc8b55a7b3e9f730786ad6eb19cd5bc020e2e02cdf7d0cb9db9c1. Subsequent calls will reuse this data.
loading weights file rag-sequence-base/pytorch_model.bin
All model checkpoint weights were used when initializing RagSequenceForGeneration.
Some weights of RagSequenceForGeneration were not initialized from the model checkpoint at rag-sequence-base and are newly initialized: ['rag.generator.lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
loading configuration file rag-sequence-base/config.json
Model config RagConfig {
"architectures": [
"RagSequenceForGeneration"
],
"dataset": "wiki_dpr",
"dataset_split": "train",
"do_deduplication": true,
"do_marginalize": false,
"doc_sep": " // ",
"exclude_bos_score": false,
"forced_eos_token_id": 2,
"generator": {
"_name_or_path": "",
"_num_labels": 3,
"activation_dropout": 0.0,
"activation_function": "gelu",
"add_bias_logits": false,
"add_cross_attention": false,
"add_final_layer_norm": false,
"architectures": [
"BartModel",
"BartForMaskedLM",
"BartForSequenceClassification"
],
"attention_dropout": 0.0,
"bad_words_ids": null,
"bos_token_id": 0,
"chunk_size_feed_forward": 0,
"classif_dropout": 0.0,
"classifier_dropout": 0.0,
"d_model": 1024,
"decoder_attention_heads": 16,
"decoder_ffn_dim": 4096,
"decoder_layerdrop": 0.0,
"decoder_layers": 12,
"decoder_start_token_id": 2,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.1,
"early_stopping": false,
"encoder_attention_heads": 16,
"encoder_ffn_dim": 4096,
"encoder_layerdrop": 0.0,
"encoder_layers": 12,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"extra_pos_embeddings": 2,
"finetuning_task": null,
"force_bos_token_to_be_generated": false,
"forced_bos_token_id": null,
"forced_eos_token_id": 2,
"gradient_checkpointing": false,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1",
"2": "LABEL_2"
},
"init_std": 0.02,
"is_decoder": false,
"is_encoder_decoder": true,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1,
"LABEL_2": 2
},
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 1024,
"min_length": 0,
"model_type": "bart",
"no_repeat_ngram_size": 0,
"normalize_before": false,
"normalize_embedding": true,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": false,
"output_scores": false,
"pad_token_id": 1,
"prefix": " ",
"pruned_heads": {},
"repetition_penalty": 1.0,
"return_dict": false,
"return_dict_in_generate": false,
"scale_embedding": false,
"sep_token_id": null,
"static_position_embeddings": false,
"task_specific_params": {
"summarization": {
"early_stopping": true,
"length_penalty": 2.0,
"max_length": 142,
"min_length": 56,
"no_repeat_ngram_size": 3,
"num_beams": 4
}
},
"temperature": 1.0,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"transformers_version": "4.4.0.dev0",
"use_bfloat16": false,
"use_cache": true,
"vocab_size": 50265
},
"index_name": "exact",
"index_path": null,
"is_encoder_decoder": true,
"label_smoothing": 0.0,
"max_combined_length": 300,
"model_type": "rag",
"n_docs": 5,
"output_retrieved": false,
"passages_path": null,
"question_encoder": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": [
"DPRQuestionEncoder"
],
"attention_probs_dropout_prob": 0.1,
"bad_words_ids": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"min_length": 0,
"model_type": "dpr",
"no_repeat_ngram_size": 0,
"num_attention_heads": 12,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"prefix": null,
"projection_dim": 0,
"pruned_heads": {},
"repetition_penalty": 1.0,
"return_dict": false,
"return_dict_in_generate": false,
"sep_token_id": null,
"task_specific_params": null,
"temperature": 1.0,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"transformers_version": "4.4.0.dev0",
"type_vocab_size": 2,
"use_bfloat16": false,
"use_cache": true,
"vocab_size": 30522
},
"reduce_loss": false,
"retrieval_batch_size": 8,
"retrieval_vector_size": 768,
"title_sep": " / ",
"use_cache": true,
"use_dummy_dataset": false,
"vocab_size": null
}
Model name 'rag-sequence-base' not found in model shortcut name list (facebook/dpr-question_encoder-single-nq-base, facebook/dpr-question_encoder-multiset-base). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/question_encoder_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/question_encoder_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/question_encoder_tokenizer/vocab.txt
loading file None
loading file None
loading file rag-sequence-base/question_encoder_tokenizer/special_tokens_map.json
loading file rag-sequence-base/question_encoder_tokenizer/tokenizer_config.json
Model name 'rag-sequence-base' not found in model shortcut name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/generator_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/generator_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/generator_tokenizer/vocab.json
loading file rag-sequence-base/generator_tokenizer/merges.txt
loading file None
loading file None
loading file rag-sequence-base/generator_tokenizer/special_tokens_map.json
loading file rag-sequence-base/generator_tokenizer/tokenizer_config.json
GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: False, using: 0 TPU cores
INFO:lightning:TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:12 (6 by maintainers)
Top GitHub Comments
Hi ! If I recall correctly the model is saved using pytorch lightning on_save_checkpoint. So the issue might come from the checkpointing config at
https://github.com/huggingface/transformers/blob/2295d783d5787bcd4c99ea0ddb2a9403697fc126/examples/research_projects/rag/callbacks_rag.py#L36-L43
Pinging @lhoestq and @patrickvonplaten