Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Noun phrase merge is failing

See original GitHub issue

This is now failing:

>>> doc = nlp('The cat sat on the mat')
>>> for np in doc.noun_chunks:
        np.merge(np.root.tag_, np.text, np.root.ent_type_)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-409-f6294d1a1cf8> in <module>()
      1 doc = nlp('The cat sat on the mat')
----> 2 for np in doc.noun_chunks:
      3     np.merge(np.root.tag_, np.text, np.root.ent_type_)

/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in noun_chunks (spacy/tokens/doc.cpp:7745)()

/Users/yaser/miniconda3/envs/spacy/python3.5/site-packages/spacy/syntax/iterators.pyx in english_noun_chunks (spacy/syntax/iterators.cpp:1559)()

/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in spacy.tokens.doc.Doc.__getitem__ (spacy/tokens/doc.cpp:4853)()

IndexError: list index out of range

Issue Analytics

State:
Created 7 years ago
Reactions:1
Comments:15 (6 by maintainers)

Top GitHub Comments

5reactions

honnibalcommented, May 20, 2016

Ah, this was dumb, sorry — I didn’t have time to really look at this, now that I have it’s obvious there’s a problem. Actually I’m not sure how the code was working before. I think there was always a bug here.

Please work around this for now by doing for np in list(doc.noun_chunks). The problem is that we’re changing the tokenisation out from underneath the iterator we’re yielding from, and this is causing problems.

I think this is always going to be hard to get right, and I’m going to change the noun chunks code to accumulate the spans before it yields them.

1reaction

anna-hopecommented, Sep 7, 2017

This issue should not have been closed because it is still present in Spacy 2.0 alpha. Merging tokens (compounds, entities, matches, etc.) often results in this IndexError: Error calculating span: Can't find start.

Top Results From Across the Web

Spacy to extract specific noun phrase - python - Stack Overflow

You can merge the noun phrases ( so that they do not get tokenized seperately). Analyse the dependency parse tree, and see the...

Which Noun Phrases Denote Which Concepts?

We currently address this problem with a heuristic post-processing step: we merge all pairs of concepts that belong to compatible categories and share...

The Syntax of Noun Phrases

Accordingly, the demonstrative is merged as the specifier of a projection headed by the reinforcer. That projection, FP, intervenes between the determiner ...

Syntax Part 3

As we've just seen, words don't get put into sentences on a one-by-one basis, instead they combine together in certain ways to form...

The Slavic Noun Phrase - Duke University

unaccusative, i.e., its eventual subject is merged as an internal argument, as the complement of the verb (Comp-of-V), then this consequence of Bare...