question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Non-NER tags are missing one letter

See original GitHub issue

How to reproduce the behaviour

If I execute the next code with POS tags

y_pred = [['ADJ', 'CONJ', 'VERB', 'AUX', 'NOUN', 'ADJ', 'SCONJ'], ['CONJ', 'SCONJ', 'X']]
y_true = [['ADJ', 'DET', 'VERB', 'AUX', 'NOUN', 'ADJ', 'SCONJ'], ['CONJ', 'ART', 'X']]
print(classification_report(y_true, y_pred))

What I get is:

              precision    recall  f1-score   support

        CONJ       0.50      1.00      0.67         1
          DJ       1.00      1.00      1.00         2
         ERB       1.00      1.00      1.00         1
          ET       0.00      0.00      0.00         1
         ONJ       0.50      1.00      0.67         1
         OUN       1.00      1.00      1.00         1
          RT       0.00      0.00      0.00         1
          UX       1.00      1.00      1.00         1

   micro avg       0.78      0.78      0.78         9
   macro avg       0.62      0.75      0.67         9
weighted avg       0.67      0.78      0.70         9

Here, all tags are missing the first letter. If I pass in suffix=True, now the missing letter of the tags is the last one:

              precision    recall  f1-score   support

          AD       1.00      1.00      1.00         2
          AR       0.00      0.00      0.00         1
          AU       1.00      1.00      1.00         1
         CON       0.50      1.00      0.67         1
          DE       0.00      0.00      0.00         1
         NOU       1.00      1.00      1.00         1
        SCON       0.50      1.00      0.67         1
         VER       1.00      1.00      1.00         1

   micro avg       0.78      0.78      0.78         9
   macro avg       0.62      0.75      0.67         9
weighted avg       0.67      0.78      0.70         9

Moreover, one letter tags are ignored.

Your Environment

  • Operating System: Ubuntu 20.10
  • Python Version: Python 3.8.6
  • Package Version: seqeval==1.2.2

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:5
  • Comments:7

github_iconTop GitHub Comments

2reactions
liaehcommented, Sep 2, 2021

Thanks for finding the key line, @liaeh! As I see it then, we only have two options:

1. We re-label our datasets if not IOB-style to start each label with `B`.

2. We add an option to the library to not remove the first character if not IOB-style.

Option 2 would make most sense! I’ve been using option 1 as a workaround though 😃

0reactions
versaecommented, Aug 30, 2021

Thanks for finding the key line, @liaeh! As I see it then, we only have two options:

  1. We re-label our datasets if not IOB-style to start each label with B.
  2. We add an option to the library to not remove the first character if not IOB-style.
Read more comments on GitHub >

github_iconTop Results From Across the Web

"XML loaded is missing letter type Tag" when testing via ...
This tool is found in: Configuration > General > Notification Template. Answer. First, verify the correct XML file is used. Download an XML ......
Read more >
Missing Letters: Seven stories and one novella
My ME A thesis, entitled MISSING LETTERS, consists of seven short stories ... I glanced at her price tag, the only thing she...
Read more >
CUSTOM LETTERS – FIRST HALF 2009 - LetterCult
Custom Letters is an evolving category that includes calligraphy, sign painting, graffiti, stone carving, digital lettering, hand lettering, ...
Read more >
Mystery remains with missing man | BasehorInfo.com
¢ 12755 Loring Drive near Bonner Springs where semitrailers were parked inside huge underground "caves." Harold said about 100 yards around the ...
Read more >
Hub - River Thames Conditions - Environment Agency - GOV.UK
Htc nexus one price in india, Yuandong industrial group limited, Dainik sambad epaper, ... Planter peuplier noir, An application letter to a company, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found