question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

label keyword argument in Span.merge has no effect

See original GitHub issue

I am trying to add custom entities via add_entity API. I’d like to be able to specify their entity types. It does not appear that the label keyword in add_entity is doing anything.

(The following description looks long, but is the same code block repeated with minor changes.)

If I use the snippet provided in issue #523:

def merge_phrase(matcher, doc, i, matches):
    '''
    Merge a phrase. We have to be careful here because we'll change the token indices.
    To avoid problems, merge all the phrases once we're called on the last match.
    '''
    if i != len(matches)-1:
        return None
    # Get Span objects
    spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches]
    for ent_id, label, span in spans:
        span.merge(label=label, tag='NNP' if label else span.root.tag_)

nlp = spacy.load('en')
nlp.matcher.add_entity('MorganStanley', on_match=merge_phrase)
nlp.matcher.add_pattern('MorganStanley', [{'orth': 'Morgan'}, {'orth': 'Stanley'}], label='ORG')
nlp.pipeline = [nlp.tagger, nlp.entity, nlp.matcher, nlp.parser]

It looks promising:

# Okay, now we've got our pipeline set up...
doc = nlp(u'Morgan Stanley fires Vice President')
for word in doc:
    print(word.text, word.tag_, word.dep_, word.head.text, word.ent_type_)

Morgan Stanley NNP amod fires ORG
fires NNS ROOT fires 
Vice NNP compound President 
President NNP appos fires 

However, I’m not sure the label='ORG' is actually doing anything. If I remove it, I get the same output.

nlp = spacy.load('en')
nlp.matcher.add_entity('MorganStanley', on_match=merge_phrase)
nlp.matcher.add_pattern('MorganStanley', [{'orth': 'Morgan'}, {'orth': 'Stanley'}])
nlp.pipeline = [nlp.tagger, nlp.entity, nlp.matcher, nlp.parser]

doc = nlp(u'Morgan Stanley fires Vice President')
for word in doc:
    print(word.text, word.tag_, word.dep_, word.head.text, word.ent_type_)

Morgan Stanley NNP amod fires ORG
fires NNS ROOT fires 
Vice NNP compound President 
President NNP appos fires

In fact, anything of the form ABC [Brothers/Limited/Company/Bank] gets labeled as an ‘ORG’. And I can’t get other patterns to be labeled ‘ORG’. E.g.:

nlp = spacy.load('en')
nlp.matcher.add_entity('MorganStanley', on_match=merge_phrase)
nlp.matcher.add_pattern('MorganStanley', [{'orth': 'State'}, {'orth': 'Street'}], label='ORG')
nlp.pipeline = [nlp.tagger, nlp.entity, nlp.matcher, nlp.parser]

doc = nlp(u'State Street fires Vice President')
for word in doc:
    print(word.text, word.tag_, word.dep_, word.head.text, word.ent_type_)

State Street NNP compound fires 
fires NNS ROOT fires 
Vice NNP compound President 
President NNP appos fires 

The .label_ property of doc.ents is always ''.

How do I set the entity label? And what is the difference between doc.ents[0].label_ and doc[0].ent_type_?

  • Operating System: Debian 8
  • Python Version Used: 3.5
  • spaCy Version Used: 1.6

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dlmiyamotocommented, Feb 27, 2017

It doesn’t seem that the label kwarg in the span.merge of the callback does anything. span.merge passes **attributes to doc.merge, but doc.merge doesn’t seem to do anything with a label kwarg. Is this a result of outdated documentation?

0reactions
lock[bot]commented, May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

got multiple values for keyword argument - python
The problem is that the first argument passed to class methods in python is always a copy of the class instance on which...
Read more >
Language Processing Pipelines · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >
Robot Framework User Guide
The named argument syntax makes using arguments with default values more flexible, and allows explicitly labeling what a certain argument value means.
Read more >
Built-in Recipes · Prodigy · An annotation tool for AI, Machine ...
Named Entity Recognition, Tag names and concepts as spans in text. ... If no labels are set, Prodigy will check the model for...
Read more >
spaCy Tutorial – Complete Writeup - Machine Learning Plus
spaCy comes with pretrained NLP models that can perform most common NLP tasks, such as tokenization, parts of speech (POS) tagging, named entity...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found