Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RAG] Expected RAG output after fine tuning

See original GitHub issue

Hi there.

Perhaps the following isn’t even a real issue, but I’m a bit confused with the current outputs I got.

I’m trying to fine tune RAG on a bunch of question-answer pairs I have (for while, not that much, < 1k ones). I have splitted them as suggested (train.source, train.target, val.source…). After running the finetune_rag.py, the outputs generated were only two files (~2 kB):

git_log.json
hparams.pkl

Is that right? Because I was expecting a big binary file or something like that containing the weight matrices, so I could use them afterwards in a new trial.

Could you please tell me what’s the point I’m missing here?

I provide more details below. Btw, I have two NVIDIA RTX 3090, 24GB each, but they were barely used in the whole process (which took ~3 hours).

Command:

python finetune_rag.py \
    --data_dir rag_manual_qa_finetuning \
    --output_dir output_ft \
    --model_name_or_path rag-sequence-base \
    --model_type rag_sequence \
    --gpus 2 \
    --distributed_retriever pytorch

Logs (in fact, it’s strange but the logs even seem to be generated in duplicate - I don’t know why):

loading configuration file rag-sequence-base/config.json
Model config RagConfig {
  "architectures": [
    "RagSequenceForGeneration"
  ],
  "dataset": "wiki_dpr",
  "dataset_split": "train",
  "do_deduplication": true,
  "do_marginalize": false,
  "doc_sep": " // ",
  "exclude_bos_score": false,
  "forced_eos_token_id": 2,
  "generator": {
    "_name_or_path": "",
    "_num_labels": 3,
    "activation_dropout": 0.0,
    "activation_function": "gelu",
    "add_bias_logits": false,
    "add_cross_attention": false,
    "add_final_layer_norm": false,
    "architectures": [
      "BartModel",
      "BartForMaskedLM",
      "BartForSequenceClassification"
    ],
    "attention_dropout": 0.0,
    "bad_words_ids": null,
    "bos_token_id": 0,
    "chunk_size_feed_forward": 0,
    "classif_dropout": 0.0,
    "classifier_dropout": 0.0,
    "d_model": 1024,
    "decoder_attention_heads": 16,
    "decoder_ffn_dim": 4096,
    "decoder_layerdrop": 0.0,
    "decoder_layers": 12,
    "decoder_start_token_id": 2,
    "diversity_penalty": 0.0,
    "do_sample": false,
    "dropout": 0.1,
    "early_stopping": false,
    "encoder_attention_heads": 16,
    "encoder_ffn_dim": 4096,
    "encoder_layerdrop": 0.0,
    "encoder_layers": 12,
    "encoder_no_repeat_ngram_size": 0,
    "eos_token_id": 2,
    "extra_pos_embeddings": 2,
    "finetuning_task": null,
    "force_bos_token_to_be_generated": false,
    "forced_bos_token_id": null,
    "forced_eos_token_id": 2,
    "gradient_checkpointing": false,
    "id2label": {
      "0": "LABEL_0",
      "1": "LABEL_1",
      "2": "LABEL_2"
    },
    "init_std": 0.02,
    "is_decoder": false,
    "is_encoder_decoder": true,
    "label2id": {
      "LABEL_0": 0,
      "LABEL_1": 1,
      "LABEL_2": 2
    },
    "length_penalty": 1.0,
    "max_length": 20,
    "max_position_embeddings": 1024,
    "min_length": 0,
    "model_type": "bart",
    "no_repeat_ngram_size": 0,
    "normalize_before": false,
    "normalize_embedding": true,
    "num_beam_groups": 1,
    "num_beams": 1,
    "num_hidden_layers": 12,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_past": false,
    "output_scores": false,
    "pad_token_id": 1,
    "prefix": " ",
    "pruned_heads": {},
    "repetition_penalty": 1.0,
    "return_dict": false,
    "return_dict_in_generate": false,
    "scale_embedding": false,
    "sep_token_id": null,
    "static_position_embeddings": false,
    "task_specific_params": {
      "summarization": {
        "early_stopping": true,
        "length_penalty": 2.0,
        "max_length": 142,
        "min_length": 56,
        "no_repeat_ngram_size": 3,
        "num_beams": 4
      }
    },
    "temperature": 1.0,
    "tie_encoder_decoder": false,
    "tie_word_embeddings": true,
    "tokenizer_class": null,
    "top_k": 50,
    "top_p": 1.0,
    "torchscript": false,
    "transformers_version": "4.4.0.dev0",
    "use_bfloat16": false,
    "use_cache": true,
    "vocab_size": 50265
  },
  "index_name": "exact",
  "index_path": null,
  "is_encoder_decoder": true,
  "label_smoothing": 0.0,
  "max_combined_length": 300,
  "model_type": "rag",
  "n_docs": 5,
  "output_retrieved": false,
  "passages_path": null,
  "question_encoder": {
    "_name_or_path": "",
    "add_cross_attention": false,
    "architectures": [
      "DPRQuestionEncoder"
    ],
    "attention_probs_dropout_prob": 0.1,
    "bad_words_ids": null,
    "bos_token_id": null,
    "chunk_size_feed_forward": 0,
    "decoder_start_token_id": null,
    "diversity_penalty": 0.0,
    "do_sample": false,
    "early_stopping": false,
    "encoder_no_repeat_ngram_size": 0,
    "eos_token_id": null,
    "finetuning_task": null,
    "forced_bos_token_id": null,
    "forced_eos_token_id": null,
    "gradient_checkpointing": false,
    "hidden_act": "gelu",
    "hidden_dropout_prob": 0.1,
    "hidden_size": 768,
    "id2label": {
      "0": "LABEL_0",
      "1": "LABEL_1"
    },
    "initializer_range": 0.02,
    "intermediate_size": 3072,
    "is_decoder": false,
    "is_encoder_decoder": false,
    "label2id": {
      "LABEL_0": 0,
      "LABEL_1": 1
    },
    "layer_norm_eps": 1e-12,
    "length_penalty": 1.0,
    "max_length": 20,
    "max_position_embeddings": 512,
    "min_length": 0,
    "model_type": "dpr",
    "no_repeat_ngram_size": 0,
    "num_attention_heads": 12,
    "num_beam_groups": 1,
    "num_beams": 1,
    "num_hidden_layers": 12,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "pad_token_id": 0,
    "position_embedding_type": "absolute",
    "prefix": null,
    "projection_dim": 0,
    "pruned_heads": {},
    "repetition_penalty": 1.0,
    "return_dict": false,
    "return_dict_in_generate": false,
    "sep_token_id": null,
    "task_specific_params": null,
    "temperature": 1.0,
    "tie_encoder_decoder": false,
    "tie_word_embeddings": true,
    "tokenizer_class": null,
    "top_k": 50,
    "top_p": 1.0,
    "torchscript": false,
    "transformers_version": "4.4.0.dev0",
    "type_vocab_size": 2,
    "use_bfloat16": false,
    "use_cache": true,
    "vocab_size": 30522
  },
  "reduce_loss": false,
  "retrieval_batch_size": 8,
  "retrieval_vector_size": 768,
  "title_sep": " / ",
  "use_cache": true,
  "use_dummy_dataset": false,
  "vocab_size": null
}

Model name 'rag-sequence-base' not found in model shortcut name list (facebook/dpr-question_encoder-single-nq-base, facebook/dpr-question_encoder-multiset-base). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/question_encoder_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/question_encoder_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/question_encoder_tokenizer/vocab.txt
loading file None
loading file None
loading file rag-sequence-base/question_encoder_tokenizer/special_tokens_map.json
loading file rag-sequence-base/question_encoder_tokenizer/tokenizer_config.json
Model name 'rag-sequence-base' not found in model shortcut name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/generator_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/generator_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/generator_tokenizer/vocab.json
loading file rag-sequence-base/generator_tokenizer/merges.txt
loading file None
loading file None
loading file rag-sequence-base/generator_tokenizer/special_tokens_map.json
loading file rag-sequence-base/generator_tokenizer/tokenizer_config.json
Loading passages from wiki_dpr
Downloading: 9.64kB [00:00, 10.8MB/s]                                           
Downloading: 67.5kB [00:00, 59.5MB/s]                                           
WARNING:datasets.builder:Using custom data configuration psgs_w100.nq.no_index-dummy=False,with_index=False
Downloading and preparing dataset wiki_dpr/psgs_w100.nq.no_index (download: 66.09 GiB, generated: 73.03 GiB, post-processed: Unknown size, total: 139.13 GiB) to /home/usp/.cache/huggingface/datasets/wiki_dpr/psgs_w100.nq.no_index-dummy=False,with_index=False/0.0.0/91b145e64f5bc8b55a7b3e9f730786ad6eb19cd5bc020e2e02cdf7d0cb9db9c1...
Downloading: 100%|█████████████████████████| 4.69G/4.69G [07:11<00:00, 10.9MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:27<00:00, 9.00MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:36<00:00, 8.47MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:37<00:00, 8.41MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:38<00:00, 8.36MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:40<00:00, 8.25MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:58<00:00, 7.45MB/s]

Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:58<00:00, 7.43MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:00<00:00, 7.34MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:04<00:00, 7.17MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:05<00:00, 7.13MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:07<00:00, 7.06MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:10<00:00, 6.94MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:24<00:00, 6.48MB/s]
Downloading: 100%|█████████████████████████| 1.32G/1.32G [03:27<00:00, 6.38MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:33<00:00, 6.21MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [04:57<00:00, 4.45MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:36<00:00, 8.47MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.94MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:44<00:00, 8.03MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:55<00:00, 7.54MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.92MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.90MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:56<00:00, 7.49MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:19<00:00, 6.63MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:53<00:00, 7.63MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:00<00:00, 7.33MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:11<00:00, 6.92MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:14<00:00, 6.80MB/s]
Downloading: 100%|█████████████████████████| 1.32G/1.32G [03:06<00:00, 7.10MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:35<00:00, 6.16MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:50<00:00, 5.76MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:28<00:00, 8.93MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:32<00:00, 8.67MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:07<00:00, 7.05MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:53<00:00, 7.62MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:22<00:00, 6.56MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:47<00:00, 7.93MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:26<00:00, 9.06MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:40<00:00, 8.25MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:42<00:00, 8.17MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:54<00:00, 7.59MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:41<00:00, 8.22MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:18<00:00, 6.69MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:30<00:00, 8.83MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [03:00<00:00, 7.34MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:20<00:00, 9.44MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:24<00:00, 9.19MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:21<00:00, 9.38MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:18<00:00, 9.59MB/s]
Downloading: 100%|█████████████████████████| 1.33G/1.33G [02:19<00:00, 9.53MB/s]
0 examples [00:00, ? examples/s]2021-03-05 12:11:39.666323: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Dataset wiki_dpr downloaded and prepared to /home/usp/.cache/huggingface/datasets/wiki_dpr/psgs_w100.nq.no_index-dummy=False,with_index=False/0.0.0/91b145e64f5bc8b55a7b3e9f730786ad6eb19cd5bc020e2e02cdf7d0cb9db9c1. Subsequent calls will reuse this data.
loading weights file rag-sequence-base/pytorch_model.bin
All model checkpoint weights were used when initializing RagSequenceForGeneration.

Some weights of RagSequenceForGeneration were not initialized from the model checkpoint at rag-sequence-base and are newly initialized: ['rag.generator.lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
loading configuration file rag-sequence-base/config.json
Model config RagConfig {
  "architectures": [
    "RagSequenceForGeneration"
  ],
  "dataset": "wiki_dpr",
  "dataset_split": "train",
  "do_deduplication": true,
  "do_marginalize": false,
  "doc_sep": " // ",
  "exclude_bos_score": false,
  "forced_eos_token_id": 2,
  "generator": {
    "_name_or_path": "",
    "_num_labels": 3,
    "activation_dropout": 0.0,
    "activation_function": "gelu",
    "add_bias_logits": false,
    "add_cross_attention": false,
    "add_final_layer_norm": false,
    "architectures": [
      "BartModel",
      "BartForMaskedLM",
      "BartForSequenceClassification"
    ],
    "attention_dropout": 0.0,
    "bad_words_ids": null,
    "bos_token_id": 0,
    "chunk_size_feed_forward": 0,
    "classif_dropout": 0.0,
    "classifier_dropout": 0.0,
    "d_model": 1024,
    "decoder_attention_heads": 16,
    "decoder_ffn_dim": 4096,
    "decoder_layerdrop": 0.0,
    "decoder_layers": 12,
    "decoder_start_token_id": 2,
    "diversity_penalty": 0.0,
    "do_sample": false,
    "dropout": 0.1,
    "early_stopping": false,
    "encoder_attention_heads": 16,
    "encoder_ffn_dim": 4096,
    "encoder_layerdrop": 0.0,
    "encoder_layers": 12,
    "encoder_no_repeat_ngram_size": 0,
    "eos_token_id": 2,
    "extra_pos_embeddings": 2,
    "finetuning_task": null,
    "force_bos_token_to_be_generated": false,
    "forced_bos_token_id": null,
    "forced_eos_token_id": 2,
    "gradient_checkpointing": false,
    "id2label": {
      "0": "LABEL_0",
      "1": "LABEL_1",
      "2": "LABEL_2"
    },
    "init_std": 0.02,
    "is_decoder": false,
    "is_encoder_decoder": true,
    "label2id": {
      "LABEL_0": 0,
      "LABEL_1": 1,
      "LABEL_2": 2
    },
    "length_penalty": 1.0,
    "max_length": 20,
    "max_position_embeddings": 1024,
    "min_length": 0,
    "model_type": "bart",
    "no_repeat_ngram_size": 0,
    "normalize_before": false,
    "normalize_embedding": true,
    "num_beam_groups": 1,
    "num_beams": 1,
    "num_hidden_layers": 12,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_past": false,
    "output_scores": false,
    "pad_token_id": 1,
    "prefix": " ",
    "pruned_heads": {},
    "repetition_penalty": 1.0,
    "return_dict": false,
    "return_dict_in_generate": false,
    "scale_embedding": false,
    "sep_token_id": null,
    "static_position_embeddings": false,
    "task_specific_params": {
      "summarization": {
        "early_stopping": true,
        "length_penalty": 2.0,
        "max_length": 142,
        "min_length": 56,
        "no_repeat_ngram_size": 3,
        "num_beams": 4
      }
    },
    "temperature": 1.0,
    "tie_encoder_decoder": false,
    "tie_word_embeddings": true,
    "tokenizer_class": null,
    "top_k": 50,
    "top_p": 1.0,
    "torchscript": false,
    "transformers_version": "4.4.0.dev0",
    "use_bfloat16": false,
    "use_cache": true,
    "vocab_size": 50265
  },
  "index_name": "exact",
  "index_path": null,
  "is_encoder_decoder": true,
  "label_smoothing": 0.0,
  "max_combined_length": 300,
  "model_type": "rag",
  "n_docs": 5,
  "output_retrieved": false,
  "passages_path": null,
  "question_encoder": {
    "_name_or_path": "",
    "add_cross_attention": false,
    "architectures": [
      "DPRQuestionEncoder"
    ],
    "attention_probs_dropout_prob": 0.1,
    "bad_words_ids": null,
    "bos_token_id": null,
    "chunk_size_feed_forward": 0,
    "decoder_start_token_id": null,
    "diversity_penalty": 0.0,
    "do_sample": false,
    "early_stopping": false,
    "encoder_no_repeat_ngram_size": 0,
    "eos_token_id": null,
    "finetuning_task": null,
    "forced_bos_token_id": null,
    "forced_eos_token_id": null,
    "gradient_checkpointing": false,
    "hidden_act": "gelu",
    "hidden_dropout_prob": 0.1,
    "hidden_size": 768,
    "id2label": {
      "0": "LABEL_0",
      "1": "LABEL_1"
    },
    "initializer_range": 0.02,
    "intermediate_size": 3072,
    "is_decoder": false,
    "is_encoder_decoder": false,
    "label2id": {
      "LABEL_0": 0,
      "LABEL_1": 1
    },
    "layer_norm_eps": 1e-12,
    "length_penalty": 1.0,
    "max_length": 20,
    "max_position_embeddings": 512,
    "min_length": 0,
    "model_type": "dpr",
    "no_repeat_ngram_size": 0,
    "num_attention_heads": 12,
    "num_beam_groups": 1,
    "num_beams": 1,
    "num_hidden_layers": 12,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "pad_token_id": 0,
    "position_embedding_type": "absolute",
    "prefix": null,
    "projection_dim": 0,
    "pruned_heads": {},
    "repetition_penalty": 1.0,
    "return_dict": false,
    "return_dict_in_generate": false,
    "sep_token_id": null,
    "task_specific_params": null,
    "temperature": 1.0,
    "tie_encoder_decoder": false,
    "tie_word_embeddings": true,
    "tokenizer_class": null,
    "top_k": 50,
    "top_p": 1.0,
    "torchscript": false,
    "transformers_version": "4.4.0.dev0",
    "type_vocab_size": 2,
    "use_bfloat16": false,
    "use_cache": true,
    "vocab_size": 30522
  },
  "reduce_loss": false,
  "retrieval_batch_size": 8,
  "retrieval_vector_size": 768,
  "title_sep": " / ",
  "use_cache": true,
  "use_dummy_dataset": false,
  "vocab_size": null
}

Model name 'rag-sequence-base' not found in model shortcut name list (facebook/dpr-question_encoder-single-nq-base, facebook/dpr-question_encoder-multiset-base). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/question_encoder_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/question_encoder_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/question_encoder_tokenizer/vocab.txt
loading file None
loading file None
loading file rag-sequence-base/question_encoder_tokenizer/special_tokens_map.json
loading file rag-sequence-base/question_encoder_tokenizer/tokenizer_config.json

Model name 'rag-sequence-base' not found in model shortcut name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). Assuming 'rag-sequence-base' is a path, a model identifier, or url to a directory containing tokenizer files.
Didn't find file rag-sequence-base/generator_tokenizer/tokenizer.json. We won't load it.
Didn't find file rag-sequence-base/generator_tokenizer/added_tokens.json. We won't load it.
loading file rag-sequence-base/generator_tokenizer/vocab.json
loading file rag-sequence-base/generator_tokenizer/merges.txt
loading file None
loading file None
loading file rag-sequence-base/generator_tokenizer/special_tokens_map.json
loading file rag-sequence-base/generator_tokenizer/tokenizer_config.json
GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: False, using: 0 TPU cores
INFO:lightning:TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]