question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spacy Matcher not working on long word combination

See original GitHub issue

I had text

text = "What factors is AdvisorShares STAR Global Buy-Write ETF exposed with?"
doc = nlp(text)
matcher(doc)

and add matcher by

matcher.add(entity_key='company-1', label='COMPANY', attrs={}, specs=spec, on_match=merge_phrases)

#[[{65: ‘AdvisorShares’}, {65: ‘STAR’}, {65: ‘Global’}, {65: ‘Buy-Write’}, {65: ‘ETF’}], [{65: ‘Advisorshares’}, {65: ‘Star’}, {65: ‘Global’}, {65: ‘Buy-Write’}, {65: ‘Etf’}]]

But system can’t recognize AdvisorShares STAR Global Buy-Write ETF as COMPANY type of entity

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Gregory-Howardcommented, Apr 11, 2017

Hi, see https://spacy.io/docs/usage/customizing-tokenizer for having some special cases for tokenization. Not tested but something like this can do the job :

nlp.tokenizer.add_special_case(u'Buy-Write',
    [
        {
            ORTH: u'Buy-Write',
            LEMMA: u'buy-write',
            POS: u'NOUN'}
       ])
0reactions
lock[bot]commented, May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Rule-based matching · spaCy Usage Documentation
To match large terminology lists, you can use the PhraseMatcher , which accepts Doc objects as match patterns. Adding patterns. Let's say we...
Read more >
Trivial example using spaCy Matcher not working
I'm trying to get the following simple example using the spaCy Matcher working: import en_core_web_sm from spacy ...
Read more >
Issue with several optional rule in Token Matcher #3951
A simple solution for you is to not try to put all the rules in one pattern but simply add multiple. import spacy...
Read more >
Rule-Based Matching with spaCy - Medium
The scenario here is we want to match two words starting with the word 'computer' followed by a word where its part-of-speech tag...
Read more >
How To Annotate Entities With Spacy PhraseMacher
The problem is that our PhraseMatcher finds both forms of the company name, the longer complete version, and the shorter, cleaner version.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found