CoNLL2003 ner_tags order mismatch between the dataset from HF and the pretrained model
See original GitHub issue@dslim23 's pretrained models such as:
https://huggingface.co/dslim/bert-base-NER
have the following NER tag order baked in:
"O", "B-MISC", "I-MISC", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC"
while the https://huggingface.co/datasets/conll2003 dataset has:
O (0), B-PER (1), I-PER (2), B-ORG (3), I-ORG (4) B-LOC (5), I-LOC (6) B-MISC (7), I-MISC (8)
The mismatch leads to defunct accuracy measurements out of the box for the pretrained NER models; try, for instance:
python examples/pytorch/token-classification/run_ner.py --model_name_or_path dslim/bert-base-NER --dataset_name conll2003 --output_dir /tmp/test-ner --do_eval
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:11 (8 by maintainers)
Top Results From Across the Web
conll2003 · Datasets at Hugging Face
The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, ...
Read more >Named-Entity Recognition on HuggingFace – Weights & Biases
Here we will use huggingface transformers based fine-tune pretrained bert based cased model on CoNLL-2003 dataset. CoNLL-2003 dataset consist of word tokens, ...
Read more >Argilla - Rubrix
Create a Argilla dataset with unlabelled data and test data . ... transformers: This library provides thousands of pre-trained models for various NLP...
Read more >arXiv:2101.08133v2 [cs.CL] 18 Feb 2021
We experiment with two widely-used datasets for evaluation of sequence tagging models and AL query strategies: English CoNLL-2003 (Sang and.
Read more >From Preprocessing to Named Entity Recognition, Linking and ...
tently provide one of the top F1-scores on the CoNLL-2003 dataset (Huang et ... the author proposes a Bi-LSTM-CRF model that takes as...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@patrickvonplaten thanks for the ping – though in the case it’s the script that should be able to remap labels no? The model looks correctly defined with https://huggingface.co/dslim/bert-base-NER/blob/main/config.json
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.