Model name 'facebook/rag-sequence-base/*' not found when running examples/rag/finetune.sh
See original GitHub issueEnvironment info
transformers
version: 3.3.1- Platform: Linux-4.15.0-38-generic-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.6.9
- PyTorch version (GPU?): 1.6.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: True (Retriever is distributed)
Who can help
Information
Model I am using (Bert, XLNet …):
facebook/rag-sequence-base
The problem arises when using:
- [x ] the official example scripts: (give details below) examples/rag/finetune.sh
The tasks I am working on is:
- [x ] my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
run sh finetune.sh
with
DATA_DIR=data_dir
OUTPUT_DIR=output_dir
MODEL_NAME_OR_PATH="facebook/rag-sequence-base"
gives:
Model name ‘facebook/rag-sequence-base/question_encoder_tokenizer’ not found in model shortcut name list (facebook/dpr-question_encoder-single-nq-base). Assuming ‘facebook/rag-sequence-base/question_encoder_tokenizer’ is a path, a model identifier, or url to a directory containing tokenizer files. loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/vocab.txt from cache at /h/asabet/.cache/torch/transformers/14d599f015518cd5b95b5d567b8c06b265dbbf04047e44b3654efd7cbbacb697.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084 loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/added_tokens.json from cache at None loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/special_tokens_map.json from cache at /h/asabet/.cache/torch/transformers/70614c7a84151409876eaaaecb3b5185213aa5c560926855e35753b9909f1116.275045728fbf41c11d3dae08b8742c054377e18d92cc7b72b6351152a99b64e4 loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/tokenizer_config.json from cache at /h/asabet/.cache/torch/transformers/8ade9cf561f8c0a47d1c3785e850c57414d776b3795e21bd01e58483399d2de4.11f57497ee659e26f830788489816dbcb678d91ae48c06c50c9dc0e4438ec05b loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/question_encoder_tokenizer/tokenizer.json from cache at None Model name ‘facebook/rag-sequence-base/generator_tokenizer’ not found in model shortcut name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). Assuming ‘facebook/rag-sequence-base/generator_tokenizer’ is a path, a model identifier, or url to a directory containing tokenizer files. loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/vocab.json from cache at /h/asabet/.cache/torch/transformers/3b9637b6eab4a48cf2bc596e5992aebb74de6e32c9ee660a27366a63a8020557.6a4061e8fc00057d21d80413635a86fdcf55b6e7594ad9e25257d2f99a02f4be loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/merges.txt from cache at /h/asabet/.cache/torch/transformers/b2a6adcb3b8a4c39e056d80a133951b99a56010158602cf85dee775936690c6a.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/added_tokens.json from cache at None loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/special_tokens_map.json from cache at /h/asabet/.cache/torch/transformers/342599872fb2f45f954699d3c67790c33b574cc552a4b433fedddc97e6a3c58e.6e217123a3ada61145de1f20b1443a1ec9aac93492a4bd1ce6a695935f0fd97a loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/tokenizer_config.json from cache at /h/asabet/.cache/torch/transformers/e5f72dc4c0b1ba585d7afb7fa5e3e52ff0e1f101e49572e2caaf38fab070d4d6.d596a549211eb890d3bb341f3a03307b199bc2d5ed81b3451618cbcb04d1f1bc loading file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/rag-sequence-base/generator_tokenizer/tokenizer.json from cache at None Traceback (most recent call last): File “finetune.py”, line 499, in <module> main(args) File “finetune.py”, line 439, in main model: GenerativeQAModule = GenerativeQAModule(args) File “finetune.py”, line 105, in init retriever = RagPyTorchDistributedRetriever.from_pretrained(hparams.model_name_or_path, config=config) File “/h/asabet/.local/lib/python3.6/site-packages/transformers/retrieval_rag.py”, line 308, in from_pretrained config, question_encoder_tokenizer=question_encoder_tokenizer, generator_tokenizer=generator_tokenizer File “/scratch/ssd001/home/asabet/transformers/examples/rag/distributed_retriever.py”, line 41, in init index=index, TypeError: init() got an unexpected keyword argument ‘index’
Expected behavior
finetune.sh should launch and run
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:15 (11 by maintainers)
Initial poster seems to be running
transformers version: 3.3.1
which makes me suspect it might not be related to the git/git-lfs migrationUpdate: @lhoestq is looking into it
Hi, I have a related issue. This happen to
"facebook/rag-token-base"
and"facebook/rag-token-nq"
and"facebook/rag-sequence-nq"
as well.Basic loading failed (was able to do it until around 2 days ago – I use version 3.5.0) Both
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
andretriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=True)
result in the same error message:
OSError: Can't load tokenizer for 'facebook/rag-sequence-nq/question_encoder_tokenizer'.
<<< Seem like it add the wrong path
question_encoder_tokenizer
at the end.