pipe(): ValueError Error parsing doc
See original GitHub issueI found strange behaviour using the pipe()
method (only verified on german variant):
If you parse a document using pipe()
you can get a ValueError, while if i use nlp(text)
everything is fine. I boiled it down to single words, while german words work, english words like ‘windows’ don’t work.
Steps to reproduce:
import spacy
nlp = spacy.load('de')
def texts():
yield "Windows"
for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
print(len(doc)) # doc access -> ValueError
Trace
ValueError Traceback (most recent call last)
<ipython-input-2-9a095ec5505b> in <module>()
8 def texts():
9 yield "Windows"
---> 10 for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
11 print(len(doc))
.../venv/lib/python3.4/site-packages/spacy/language.py in pipe(self, texts, tag, parse, entity, n_threads, batch_size)
254 stream = self.entity.pipe(stream,
255 n_threads=1, batch_size=batch_size)
--> 256 for doc in stream:
257 yield doc
258
ValueError: Error parsing doc: Windows
If you use nlp("Windows")
it works fine. Also if you execute nlp("Windows")
before the same pipe()
call, pipe()
does not raise an exception (a dictionary is built?)
Versions:
Python 3.4.3 (Problem not related to ipython)
spacy 0.101.0
Maybe this is related to this region syntax/parser.pyx
if not eg.is_valid[guess]:
# with gil:
# move_name = self.moves.move_name(action.move, action.label)
# print 'invalid action:', move_name
return 1
Issue Analytics
- State:
- Created 7 years ago
- Comments:24 (15 by maintainers)
Top Results From Across the Web
ValueError: nlp.add_pipe now takes the string name of the ...
For spaCy v2, the normal way to add an entity ruler looked like this: ruler = EntityRuler(nlp) nlp.add_pipe(ruler) ruler.add_patterns(...).
Read more >Built-in Exceptions — Python 3.11.1 documentation
Raised when the parser encounters a syntax error. This may occur in an import statement, in a call to the built-in functions compile()...
Read more >Language Processing Pipelines · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >Pipelines — returns 0.19.0 documentation - Docs
You might consider pipe() as returns.functions.compose() on steroids. The main difference is that compose takes strictly two arguments (or you might say ...
Read more >Multi-threading spaCy's parser and named entity recognizer
The pay-off is the .pipe() method, which adds data-streaming capabilities ... Stream Parsing import spacy nlp = spacy.load('de') for doc in ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think this should fix the segfault too — I think they were related.
Closing for now. Again, if it reoccurs, don’t hesitate to reopen 😃
Yes. I have the test set up and I’m pretty sure I understand the problem now. Fix should be out soon.