Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to integrate util.filter_spans in nlp.pipe() ? - ValueError: [E102] Can't merge non-disjoint spans.

See original GitHub issue

Hi, I’ve added nlp.create_pipe(“merge_noun_chunks”) to my nlp pipeline as described here: https://spacy.io/api/pipeline-functions. When I run the nlp pipeline on large amounts of text, I sometimes get the following error. (for some corpuses I get the error, for others I dont - probably depending on random sentences).

ValueError: [E102] Can't merge non-disjoint spans. 'online' is already part of tokens to merge. If you want to find the longest non-overlapping spans, you can use the util.filter_spans helper:
https://spacy.io/api/top-level#util.filter_spans

I saw in other issues (e.g. https://github.com/explosion/spaCy/issues/3687), that this can be solved with the util.filter_spans function, but I don’t understand how to integrate this helper function in an nlp.pipe pipeline.

Thanks for your advice 😃

How to reproduce the behaviour

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(nlp.create_pipe("merge_noun_chunks"))
        
docs = []
for doc, context in nlp.pipe(context_tpl_lst, as_tuples=True, n_process=1):
                doc._.Date = context["Date"]
                doc._.Category = context["Category"]
                doc._.ID = context["ID"] 
                docs.append(doc)

(Unfortunately I can’t give you a specific string or context_tpl_lst object, because I don’t know which sentence in my corpus is causing the error)

Your Environment

spaCy version: 2.2.3
Platform: Darwin-19.4.0-x86_64-i386-64bit
Python version: 3.7.6

Issue Analytics

State:
Created 3 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

MoritzLaurercommented, May 22, 2020

thanks for fixing this 😃 @honnibal @svlandeg @adrianeboyd

1reaction

adrianeboydcommented, May 19, 2020

An additional text from #5458:

text = "In an era where markets have brought prosperity and empowerment, this leader clings to a bankrupt ideology that has brought Cuba's workers and farmers and families nothing -- nothing -- but isolation and misery."