question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pipe(): ValueError Error parsing doc

See original GitHub issue

I found strange behaviour using the pipe() method (only verified on german variant):

If you parse a document using pipe() you can get a ValueError, while if i use nlp(text) everything is fine. I boiled it down to single words, while german words work, english words like ‘windows’ don’t work.

Steps to reproduce:

import spacy
nlp = spacy.load('de')
def texts():
    yield "Windows"
for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
    print(len(doc))  # doc access -> ValueError

Trace

ValueError                                Traceback (most recent call last)
<ipython-input-2-9a095ec5505b> in <module>()
      8 def texts():
      9     yield "Windows"
---> 10 for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
     11     print(len(doc))

.../venv/lib/python3.4/site-packages/spacy/language.py in pipe(self, texts, tag, parse, entity, n_threads, batch_size)
    254             stream = self.entity.pipe(stream,
    255                 n_threads=1, batch_size=batch_size)
--> 256         for doc in stream:
    257             yield doc
    258 
ValueError: Error parsing doc: Windows

If you use nlp("Windows") it works fine. Also if you execute nlp("Windows") before the same pipe() call, pipe() does not raise an exception (a dictionary is built?)

Versions:

Python 3.4.3 (Problem not related to ipython)
spacy 0.101.0

Maybe this is related to this region syntax/parser.pyx

if not eg.is_valid[guess]:
    # with gil:
    #     move_name = self.moves.move_name(action.move, action.label)
    #     print 'invalid action:', move_name
    return 1

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:24 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
honnibalcommented, Oct 27, 2016

I think this should fix the segfault too — I think they were related.

Closing for now. Again, if it reoccurs, don’t hesitate to reopen 😃

1reaction
honnibalcommented, Oct 27, 2016

Yes. I have the test set up and I’m pretty sure I understand the problem now. Fix should be out soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError: nlp.add_pipe now takes the string name of the ...
For spaCy v2, the normal way to add an entity ruler looked like this: ruler = EntityRuler(nlp) nlp.add_pipe(ruler) ruler.add_patterns(...).
Read more >
Built-in Exceptions — Python 3.11.1 documentation
Raised when the parser encounters a syntax error. This may occur in an import statement, in a call to the built-in functions compile()...
Read more >
Language Processing Pipelines · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >
Pipelines — returns 0.19.0 documentation - Docs
You might consider pipe() as returns.functions.compose() on steroids. The main difference is that compose takes strictly two arguments (or you might say ...
Read more >
Multi-threading spaCy's parser and named entity recognizer
The pay-off is the .pipe() method, which adds data-streaming capabilities ... Stream Parsing import spacy nlp = spacy.load('de') for doc in ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found