Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Finetuned BERT model does not seem to predict right labels/work properly?

See original GitHub issue

❓ Questions & Help

I am trying out a finetuned BERT model for token classification (–> https://huggingface.co/bert-base-cased-finetuned-conll03-english), but when I observe the model output (i.e. the logits after applying the softmax) and compare it with the true label_ids, they are totally uncorrelated (see pictures).

https://i.stack.imgur.com/gVyMn.png https://i.stack.imgur.com/qS62L.png

Details

I assume that the finetuned model (bert-base-cased-finetuned-conll03-english) is correctly pretrained, but I don’t seem to understand why its predictions are off. I think one issue is that the pretrained model has another labelling scheme than I made myself during data prep (so that the tag2name dict is different), but I don’t know how I can find out what label-index map the model uses for its predictions. Even then it is not the case that the model consistently makes the same mistakes, it is outputting things quite randomly.

Any idea what the issue could be?

Issue Analytics

State:
Created 4 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

LysandreJikcommented, Mar 2, 2020

Hi! From my experience using the community-contributed dbmdz/bert-large-cased-finetuned-conll03-english (which is the same checkpoint as) bert-large-cased-finetuned-conll03-english, using the bert-base-cased tokenizer instead of the tokenizer loaded from that checkpoint works better.

You can see an example of this in the usage, let me know if it helps.

I suspect the difference between the tokenizers is due to a lowercasing of all inputs. I’m looking into it now.

PS: the file bert-large-cased-finetuned-conll03-english is deprecated in favor of the aforementionned dbmdz/bert-large-cased-finetuned-conll03-english as they are duplicates. @julien-c is currently deleting it from the S3, please use the dbmdz file/folder.

0reactions

stale[bot]commented, May 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.