question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to integrate util.filter_spans in nlp.pipe() ? - ValueError: [E102] Can't merge non-disjoint spans.

See original GitHub issue

Hi, I’ve added nlp.create_pipe(“merge_noun_chunks”) to my nlp pipeline as described here: https://spacy.io/api/pipeline-functions. When I run the nlp pipeline on large amounts of text, I sometimes get the following error. (for some corpuses I get the error, for others I dont - probably depending on random sentences).

ValueError: [E102] Can't merge non-disjoint spans. 'online' is already part of tokens to merge. If you want to find the longest non-overlapping spans, you can use the util.filter_spans helper:
https://spacy.io/api/top-level#util.filter_spans

I saw in other issues (e.g. https://github.com/explosion/spaCy/issues/3687), that this can be solved with the util.filter_spans function, but I don’t understand how to integrate this helper function in an nlp.pipe pipeline.

Thanks for your advice 😃

How to reproduce the behaviour

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(nlp.create_pipe("merge_noun_chunks"))
        
docs = []
for doc, context in nlp.pipe(context_tpl_lst, as_tuples=True, n_process=1):
                doc._.Date = context["Date"]
                doc._.Category = context["Category"]
                doc._.ID = context["ID"] 
                docs.append(doc)

(Unfortunately I can’t give you a specific string or context_tpl_lst object, because I don’t know which sentence in my corpus is causing the error)

Your Environment

  • spaCy version: 2.2.3
  • Platform: Darwin-19.4.0-x86_64-i386-64bit
  • Python version: 3.7.6

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
MoritzLaurercommented, May 22, 2020

thanks for fixing this 😃 @honnibal @svlandeg @adrianeboyd

1reaction
adrianeboydcommented, May 19, 2020

An additional text from #5458:

text = "In an era where markets have brought prosperity and empowerment, this leader clings to a bankrupt ideology that has brought Cuba's workers and farmers and families nothing -- nothing -- but isolation and misery."
Read more comments on GitHub >

github_iconTop Results From Across the Web

Can't merge non-disjoint spans when using terms.train-vectors
Hey there, When I try and run terms.vectors-teach I get the following error message: ValueError: [E102] Can't merge non-disjoint spans.
Read more >
Adding Span Categorizer to Spacy at the end of pipeline is not ...
You do not want to do this. Calling nlp.initialize() on the whole pipeline will reset all the components in the pipeline including existing ......
Read more >
Training Pipelines & Models · spaCy Usage Documentation
Train and update components on your own data and integrate custom models.
Read more >
ValueError: [E102] Can't merge non-disjoint spans. - Dutch issue
Currently we're trying to use spaCy 3.3 to parse Dutch (nl_core_news_lg) texts - I would like the nouns to be merged again, but...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found