Noun phrase merge is failing
See original GitHub issueThis is now failing:
>>> doc = nlp('The cat sat on the mat')
>>> for np in doc.noun_chunks:
np.merge(np.root.tag_, np.text, np.root.ent_type_)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-409-f6294d1a1cf8> in <module>()
1 doc = nlp('The cat sat on the mat')
----> 2 for np in doc.noun_chunks:
3 np.merge(np.root.tag_, np.text, np.root.ent_type_)
/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in noun_chunks (spacy/tokens/doc.cpp:7745)()
/Users/yaser/miniconda3/envs/spacy/python3.5/site-packages/spacy/syntax/iterators.pyx in english_noun_chunks (spacy/syntax/iterators.cpp:1559)()
/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in spacy.tokens.doc.Doc.__getitem__ (spacy/tokens/doc.cpp:4853)()
IndexError: list index out of range
Issue Analytics
- State:
- Created 7 years ago
- Reactions:1
- Comments:15 (6 by maintainers)
Top Results From Across the Web
Spacy to extract specific noun phrase - python - Stack Overflow
You can merge the noun phrases ( so that they do not get tokenized seperately). Analyse the dependency parse tree, and see the...
Read more >Which Noun Phrases Denote Which Concepts?
We currently address this problem with a heuristic post-processing step: we merge all pairs of concepts that belong to compatible categories and share...
Read more >The Syntax of Noun Phrases
Accordingly, the demonstrative is merged as the specifier of a projection headed by the reinforcer. That projection, FP, intervenes between the determiner ...
Read more >Syntax Part 3
As we've just seen, words don't get put into sentences on a one-by-one basis, instead they combine together in certain ways to form...
Read more >The Slavic Noun Phrase - Duke University
unaccusative, i.e., its eventual subject is merged as an internal argument, as the complement of the verb (Comp-of-V), then this consequence of Bare...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ah, this was dumb, sorry — I didn’t have time to really look at this, now that I have it’s obvious there’s a problem. Actually I’m not sure how the code was working before. I think there was always a bug here.
Please work around this for now by doing
for np in list(doc.noun_chunks)
. The problem is that we’re changing the tokenisation out from underneath the iterator we’re yielding from, and this is causing problems.I think this is always going to be hard to get right, and I’m going to change the noun chunks code to accumulate the spans before it yields them.
This issue should not have been closed because it is still present in Spacy 2.0 alpha. Merging tokens (compounds, entities, matches, etc.) often results in this
IndexError: Error calculating span: Can't find start
.