Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

run_tf_ner.py doesn't work with unlabelled test data

See original GitHub issue

When running run_tf_ner.py in predict mode if all the labels in test data are O, script errors out with File "/home/himanshu/.local/lib/python3.7/site-packages/numpy/lib/function_base.py", line 423, in average "Weights sum to zero, can't be normalized") ZeroDivisionError: Weights sum to zero, can't be normalized This is because pad_token_label_id https://github.com/huggingface/transformers/blob/cae334c43c49aa770d9dac1ee48319679ee8c72c/examples/ner/run_tf_ner.py#L511 , label_id for O are both zero, resulting in empty y_pred https://github.com/huggingface/transformers/blob/cae334c43c49aa770d9dac1ee48319679ee8c72c/examples/ner/run_tf_ner.py#L364-L367 Shouldn’t the pad_token_label_id be different?

Issue Analytics

State:
Created 4 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

2reactions

VDCN12593commented, Mar 20, 2020

I have noticed the same issue and posted a question here: https://stackoverflow.com/questions/60732509/label-handling-confusion-in-run-tf-ner-example

I think pad_token_label_id should definitely not fall into the range of actual labels. Maybe we can make it -1 or num(label) or something. Also as shown in convert_examples_to_features(), pad_token_label_id is not only used for pad tokens at the end of the sequence, but also for non-first tokens inside a word when the word is split up to multiple tokens. Accordingly, during prediction, only the label of the first token in each word is used. So I am wondering if we should modify input_mask so that the loss does not take into account non-first tokens in a word.

I tried to set pad_token_label_id = -1, mask out non-first tokens in each word by changing input_mask, and change num_labels to len(labels) instead of len(labels) + 1. The training and evaluation can run, but the F1-score on the test set becomes much lower (on both conll03 English and Ontonotes English). I am still confused about this.

0reactions

stale[bot]commented, May 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.