Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

In run_xnli.py, output_dir seems to be used in place of tokenizer_name

See original GitHub issue

🐛 Bug

Information

I am trying to run the run_xnli example as found in the documentation. Unfortunately, I get a strange error were the script thinks the output_dir argument contains a model name.

It seems that output_dir has been used in place of tokenizer_name in some instances, such as this: tokenizer = tokenizer_class.from_pretrained(args.output_dir, do_lower_case=args.do_lower_case)

To reproduce

Steps to reproduce the behavior:

Follow the example as found here: https://huggingface.co/transformers/examples.html#xnli

I get the following error:

Traceback (most recent call last):
  File "run_xnli.py", line 646, in <module>
    main()
  File "run_xnli.py", line 624, in main
    tokenizer = tokenizer_class.from_pretrained(args.output_dir, do_lower_case=args.do_lower_case)
  File "/mounts/Users/cisintern/antmarakis/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 868, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/mounts/Users/cisintern/antmarakis/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 971, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name '/tmp/debug_xnli/' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed '/tmp/debug_xnli/' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

julien-ccommented, Apr 24, 2020

To be honest, someone should rewrite this script according to #3800

In case you want to do it @antmarakis (we can help) 😊

0reactions

stale[bot]commented, Jun 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.