question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

In run_xnli.py, output_dir seems to be used in place of tokenizer_name

See original GitHub issue

🐛 Bug

Information

I am trying to run the run_xnli example as found in the documentation. Unfortunately, I get a strange error were the script thinks the output_dir argument contains a model name.

It seems that output_dir has been used in place of tokenizer_name in some instances, such as this: tokenizer = tokenizer_class.from_pretrained(args.output_dir, do_lower_case=args.do_lower_case)

To reproduce

Steps to reproduce the behavior:

  1. Follow the example as found here: https://huggingface.co/transformers/examples.html#xnli

I get the following error:

Traceback (most recent call last):
  File "run_xnli.py", line 646, in <module>
    main()
  File "run_xnli.py", line 624, in main
    tokenizer = tokenizer_class.from_pretrained(args.output_dir, do_lower_case=args.do_lower_case)
  File "/mounts/Users/cisintern/antmarakis/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 868, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/mounts/Users/cisintern/antmarakis/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 971, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name '/tmp/debug_xnli/' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed '/tmp/debug_xnli/' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
julien-ccommented, Apr 24, 2020

To be honest, someone should rewrite this script according to #3800

In case you want to do it @antmarakis (we can help) 😊

0reactions
stale[bot]commented, Jun 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found