KeyError [E018] when using nlp.pipe with n_process > 1
See original GitHub issueHow to reproduce the behaviour
Hi, I’m trying to use the new ja_core_news_sm model to stream process a collection of sentences with nlp.pipe(list_of_sentences)
. I’d like to be able to set n_process
> 1 to increase speed, but when I do that I encounter KeyError [E018]. I’m using WSL through VSCode.
I’ll try something like this…
Word = collections.namedtuple("Word", ["surface", "lemma", "upos", "xpos", "dep"])
nlp = spacy.load("ja_core_news_sm", disable=["ner", "entity_linker"])
for doc in nlp.pipe(sentences, batch_size=150, n_process=2):
for token in doc:
word = Word(surface=token.text, lemma=token.lemma_, upos=token.pos_, xpos=token.tag_, dep=token.dep_)
and get this error.
word = Word(surface=token.text, lemma=token.lemma_, upos=token.pos_, xpos=token.tag_, dep=token.dep_)
File "token.pyx", line 894, in spacy.tokens.token.Token.lemma_.__get__
File "strings.pyx", line 136, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '17260935250788936050'. This usually refers to an issue with the `Vocab` or `StringStore`."
The failing token wasn’t the first one in the sentence, so I counted the number of tokens throwing Exceptions and in one collection of sentences I have, 397/53597 iterated tokens cause an Exception (so far the number of failures has stayed constant on re-runs varying batch_size
and n_process
).
Just to sanity check, a bare nlp.pipe()
or [nlp(s) for s in sentences]
work with no issues. Possibly a model-specific issue?
Your Environment
- Operating System: Windows 10 (Build 18363)
- Python Version Used: 3.8.1
- spaCy Version Used: 2.3.0
- Environment Information: WSL (Linux-4.4.0-18362-Microsoft-x86_64-with-glibc2.27)
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
spaCy nlp.pipe error with multiprocessing (n_process > 1 ...
I'm trying to use spacy-langdetect to add a language detection feature in my spaCy NLP pipeline. Everything looks good when I use a...
Read more >E018 when fine-tuning parser - solved - Prodigy Support
I am trying to fine-tune the parser component using ... "KeyError: "[E018] Can't retrieve string for hash '14000015214052600094'.
Read more >spaCy Tutorial – Complete Writeup - Machine Learning Plus
This tutorial is a complete guide to learn how to use spaCy for various tasks. Overview. 1. Introduction The Doc object 2. Tokenization...
Read more >Spacy.Matcher.Phrasematcher Object Has No Attribute ...
Current Behaviour: KeyError: [E018] Can't retrieve string for hash ... The most straightforward solution is to use the same nlp pipeline also work...
Read more >How to Develop a Deep Learning Photo Caption Generator ...
A good dataset to use when getting started with image captioning is ... One measure that can be used to evaluate the skill...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the report, that does look like a related bug. I’ll look into it!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.