In run_xnli.py, output_dir seems to be used in place of tokenizer_name
See original GitHub issue🐛 Bug
Information
I am trying to run the run_xnli
example as found in the documentation. Unfortunately, I get a strange error were the script thinks the output_dir
argument contains a model name.
It seems that output_dir
has been used in place of tokenizer_name
in some instances, such as this: tokenizer = tokenizer_class.from_pretrained(args.output_dir, do_lower_case=args.do_lower_case)
To reproduce
Steps to reproduce the behavior:
- Follow the example as found here: https://huggingface.co/transformers/examples.html#xnli
I get the following error:
Traceback (most recent call last):
File "run_xnli.py", line 646, in <module>
main()
File "run_xnli.py", line 624, in main
tokenizer = tokenizer_class.from_pretrained(args.output_dir, do_lower_case=args.do_lower_case)
File "/mounts/Users/cisintern/antmarakis/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 868, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/mounts/Users/cisintern/antmarakis/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 971, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name '/tmp/debug_xnli/' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed '/tmp/debug_xnli/' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
To be honest, someone should rewrite this script according to #3800
In case you want to do it @antmarakis (we can help) 😊
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.