Spacy Matcher not working on long word combination
See original GitHub issueI had text
text = "What factors is AdvisorShares STAR Global Buy-Write ETF exposed with?"
doc = nlp(text)
matcher(doc)
and add matcher by
matcher.add(entity_key='company-1', label='COMPANY', attrs={}, specs=spec, on_match=merge_phrases)
#[[{65: ‘AdvisorShares’}, {65: ‘STAR’}, {65: ‘Global’}, {65: ‘Buy-Write’}, {65: ‘ETF’}], [{65: ‘Advisorshares’}, {65: ‘Star’}, {65: ‘Global’}, {65: ‘Buy-Write’}, {65: ‘Etf’}]]
But system can’t recognize AdvisorShares STAR Global Buy-Write ETF as COMPANY type of entity
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Rule-based matching · spaCy Usage Documentation
To match large terminology lists, you can use the PhraseMatcher , which accepts Doc objects as match patterns. Adding patterns. Let's say we...
Read more >Trivial example using spaCy Matcher not working
I'm trying to get the following simple example using the spaCy Matcher working: import en_core_web_sm from spacy ...
Read more >Issue with several optional rule in Token Matcher #3951
A simple solution for you is to not try to put all the rules in one pattern but simply add multiple. import spacy...
Read more >Rule-Based Matching with spaCy - Medium
The scenario here is we want to match two words starting with the word 'computer' followed by a word where its part-of-speech tag...
Read more >How To Annotate Entities With Spacy PhraseMacher
The problem is that our PhraseMatcher finds both forms of the company name, the longer complete version, and the shorter, cleaner version.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, see https://spacy.io/docs/usage/customizing-tokenizer for having some special cases for tokenization. Not tested but something like this can do the job :
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.