Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A bug in the padding of input examples in the NER fine-tuning example

See original GitHub issue

🐛 Bug

Information

Model I am using (Bert, XLNet …): Roberta

Language I am using the model on (English, Chinese …): English

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

TODO

Expected behavior

https://github.com/huggingface/transformers/blob/c59b1e682d6ebaf7295c63418d4570228904e690/examples/ner/utils_ner.py#L123 This line is supposed to return 3 for Roberta models but it’s just returning 2 causing the length of the input_ids to be more than the max_seq_len. This might be the reason for that: https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_roberta.py#L288 TODO: Share the notebook.

Environment info

transformers version: 2.8.0
Platform: Linux-4.19.104±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): 1.4.0 (True)
Tensorflow version (GPU?): 2.2.0-rc2 (True)
Using GPU in script?: <fill in>
Using distributed or parallel set-up in script?: <fill in>

Issue Analytics

State:
Created 3 years ago
Comments:9 (2 by maintainers)

Top GitHub Comments

1reaction

nbroad1881commented, Apr 29, 2020

@TarasPriadka, @AMR-KELEG

I had a similar issue using preprocess.py on an NER dataset.

Traceback (most recent call last):
  File "preprocess.py", line 12, in <module>
    max_len -= tokenizer.num_special_tokens_to_add()
AttributeError: 'BertTokenizer' object has no attribute 'num_special_tokens_to_add'

I think the PyPi file hasn’t been updated, so pip install transformers won’t have the files you need. I built from source and the errors went away. If you try building from source, I think your problem might go away too.

1reaction

TarasPriadkacommented, Apr 17, 2020

I had an issue with the running the NER model. In this commit https://github.com/huggingface/transformers/commit/96ab75b8dd48a9384a74ba4307a4ebfb197343cd num_added_tokens got changed into num_special_tokens_to_add. Just changing the name of the variable in the utils_ner.py fixed the issue for me. However, I had an issue with variable name not being found. Let me know if this fixes you problem.