How to integrate util.filter_spans in nlp.pipe() ? - ValueError: [E102] Can't merge non-disjoint spans.
See original GitHub issueHi, I’ve added nlp.create_pipe(“merge_noun_chunks”) to my nlp pipeline as described here: https://spacy.io/api/pipeline-functions. When I run the nlp pipeline on large amounts of text, I sometimes get the following error. (for some corpuses I get the error, for others I dont - probably depending on random sentences).
ValueError: [E102] Can't merge non-disjoint spans. 'online' is already part of tokens to merge. If you want to find the longest non-overlapping spans, you can use the util.filter_spans helper:
https://spacy.io/api/top-level#util.filter_spans
I saw in other issues (e.g. https://github.com/explosion/spaCy/issues/3687), that this can be solved with the util.filter_spans function, but I don’t understand how to integrate this helper function in an nlp.pipe pipeline.
Thanks for your advice 😃
How to reproduce the behaviour
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(nlp.create_pipe("merge_noun_chunks"))
docs = []
for doc, context in nlp.pipe(context_tpl_lst, as_tuples=True, n_process=1):
doc._.Date = context["Date"]
doc._.Category = context["Category"]
doc._.ID = context["ID"]
docs.append(doc)
(Unfortunately I can’t give you a specific string or context_tpl_lst object, because I don’t know which sentence in my corpus is causing the error)
Your Environment
- spaCy version: 2.2.3
- Platform: Darwin-19.4.0-x86_64-i386-64bit
- Python version: 3.7.6
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Can't merge non-disjoint spans when using terms.train-vectors
Hey there, When I try and run terms.vectors-teach I get the following error message: ValueError: [E102] Can't merge non-disjoint spans.
Read more >Adding Span Categorizer to Spacy at the end of pipeline is not ...
You do not want to do this. Calling nlp.initialize() on the whole pipeline will reset all the components in the pipeline including existing ......
Read more >Training Pipelines & Models · spaCy Usage Documentation
Train and update components on your own data and integrate custom models.
Read more >ValueError: [E102] Can't merge non-disjoint spans. - Dutch issue
Currently we're trying to use spaCy 3.3 to parse Dutch (nl_core_news_lg) texts - I would like the nouns to be merged again, but...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
thanks for fixing this 😃 @honnibal @svlandeg @adrianeboyd
An additional text from #5458: