question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Name Entity labels in Spacy

See original GitHub issue

I have always been using Ginza within Spacy and on previous version, the number of labels was small and it was quite general (e.g. 渋谷 and 日本 are classified as GPE). However, with version 3.1.0, I noticed that it has been changed and not following these mappings again. Do you have any documentations on all NER labels that are inside the Ginza’s Spacy implementation?

Also, I noticed that some entities that are previously kinda correct, but with version 3.x.x it’s wrongly classified. For example with these lines of code:

import spacy
nlp = spacy.load('ja_ginza')
doc = nlp('私は東京ディズニーシーへ行った')
for ent in doc.ents:
    print(ent.text, ent.label_)
# ver 2.x.x yields 東京ディズニーシー ORG
# ver 3.x.x yields 東京ディズニーシー Person

doc = nlp('エベレストの高さはどのぐらい')
for ent in doc.ents:
    print(ent.text, ent.label_)
# ver 2.x.x yields エベレスト LOC
# ver 3.x.x yields エベレスト Company

Thank you very much for your work.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
hiroshi-matsuda-ritcommented, Jan 22, 2020

@tomgun132 The GSK2014-A contains some labels, which are defined on Sekine’s ENE version 7. https://sites.google.com/site/extendednamedentity711/top以下の階層の全リスト/0-top-top/1-名前-name/1-5-shi-she-ming-facility/1-5-3-goe-goe/1-5-3-0-goe-sono-ta-goe-other

And also, some miss-spelled labels exist in it. Please see all the ENE labels used in GSK2014-A here. https://github.com/megagonlabs/ginza/blob/develop/ginza/ent_type_mapping.py#L13

1reaction
tomgun132commented, Jan 20, 2020

Thank you very much for your reply.

I tried 東京ディズニーランド and 東京ディズニーリゾート with previous codes and the results:

doc = nlp('東京ディズニーランドへ行った')
for ent in doc.ents:
    print(ent.text, ent.label_)
for sent in doc.sents:
    for token in sent:
        print(token.i, token.orth_, token.lemma_, token.pos_, token.tag_, token.dep_, token.head.i, token._.ne)
    print('EOS')
# 東京ディズニーランド GOE_Other
# 0 東京 東京 PROPN 名詞-固有名詞-地名-一般 compound 1 B_FAC
# 1 ディズニーランド ディズニーランド NOUN 名詞-固有名詞-一般 nmod 3 I_FAC
# 2 へ へ ADP 助詞-格助詞 case 1 
# 3 行っ 行く VERB 動詞-非自立可能 ROOT 3 
# 4 た た AUX 助動詞 aux 3 
# EOS

doc = nlp('東京ディズニーリゾートへ行った')
for ent in doc.ents:
    print(ent.text, ent.label_)
for sent in doc.sents:
    for token in sent:
        print(token.i, token.orth_, token.lemma_, token.pos_, token.tag_, token.dep_, token.head.i, token._.ne)
    print('EOS')
# 東京ディズニーリゾート Company
# 0 東京 東京 PROPN 名詞-固有名詞-地名-一般 compound 2 B_ORG
# 1 ディズニー ディズニー PROPN 名詞-固有名詞-人名-一般 compound 2 I_ORG
# 2 リゾート リゾート NOUN 名詞-普通名詞-一般 nmod 4 I_ORG
# 3 へ へ ADP 助詞-格助詞 case 2 
# 4 行っ 行く VERB 動詞-非自立可能 ROOT 4 
# 5 た た AUX 助動詞 aux 4 
# EOS

It’s interesting that Disney Resort is considered as ORG and Disney Land is considered as FAC. However for Disney Land, in the ents.label_ value, it’s GOE_Other and I tried to search in Sekine’s extended NER page, GOE doesn’t exist.

Read more comments on GitHub >

github_iconTop Results From Across the Web

EntityRecognizer · spaCy API Documentation
A transition-based named entity recognition component. The entity recognizer identifies non-overlapping labelled spans of tokens.
Read more >
Named Entity Recognition (NER) in Spacy Library - MLK
The labels or named entities that Spacy library can recognize include companies, locations, organizations, and products. The Spacy model is pre- ...
Read more >
Named Entity Recognition NER using spaCy | NLP | Part 4
Named Entity Recognition NER works by locating and identifying the named entities present in unstructured text into the standard categories such as person...
Read more >
NER Tagging in Python using spaCy - Medium
We can tag documents using NER and based on the value and type of entity labels these can be utilized in various scenarios....
Read more >
Named Entity Recognition (NER) in Python with Spacy
A named entity is basically a real-life object which has proper identification and can be denoted with a proper name. Named Entities can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found