question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Noun phrase merge is failing

See original GitHub issue

This is now failing:

>>> doc = nlp('The cat sat on the mat')
>>> for np in doc.noun_chunks:
        np.merge(np.root.tag_, np.text, np.root.ent_type_)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-409-f6294d1a1cf8> in <module>()
      1 doc = nlp('The cat sat on the mat')
----> 2 for np in doc.noun_chunks:
      3     np.merge(np.root.tag_, np.text, np.root.ent_type_)

/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in noun_chunks (spacy/tokens/doc.cpp:7745)()

/Users/yaser/miniconda3/envs/spacy/python3.5/site-packages/spacy/syntax/iterators.pyx in english_noun_chunks (spacy/syntax/iterators.cpp:1559)()

/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in spacy.tokens.doc.Doc.__getitem__ (spacy/tokens/doc.cpp:4853)()

IndexError: list index out of range

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:1
  • Comments:15 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
honnibalcommented, May 20, 2016

Ah, this was dumb, sorry — I didn’t have time to really look at this, now that I have it’s obvious there’s a problem. Actually I’m not sure how the code was working before. I think there was always a bug here.

Please work around this for now by doing for np in list(doc.noun_chunks). The problem is that we’re changing the tokenisation out from underneath the iterator we’re yielding from, and this is causing problems.

I think this is always going to be hard to get right, and I’m going to change the noun chunks code to accumulate the spans before it yields them.

1reaction
anna-hopecommented, Sep 7, 2017

This issue should not have been closed because it is still present in Spacy 2.0 alpha. Merging tokens (compounds, entities, matches, etc.) often results in this IndexError: Error calculating span: Can't find start.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spacy to extract specific noun phrase - python - Stack Overflow
You can merge the noun phrases ( so that they do not get tokenized seperately). Analyse the dependency parse tree, and see the...
Read more >
Which Noun Phrases Denote Which Concepts?
We currently address this problem with a heuristic post-processing step: we merge all pairs of concepts that belong to compatible categories and share...
Read more >
The Syntax of Noun Phrases
Accordingly, the demonstrative is merged as the specifier of a projection headed by the reinforcer. That projection, FP, intervenes between the determiner ...
Read more >
Syntax Part 3
As we've just seen, words don't get put into sentences on a one-by-one basis, instead they combine together in certain ways to form...
Read more >
The Slavic Noun Phrase - Duke University
unaccusative, i.e., its eventual subject is merged as an internal argument, as the complement of the verb (Comp-of-V), then this consequence of Bare...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found