question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError [E018] when using nlp.pipe with n_process > 1

See original GitHub issue

How to reproduce the behaviour

Hi, I’m trying to use the new ja_core_news_sm model to stream process a collection of sentences with nlp.pipe(list_of_sentences). I’d like to be able to set n_process > 1 to increase speed, but when I do that I encounter KeyError [E018]. I’m using WSL through VSCode.

I’ll try something like this…


Word = collections.namedtuple("Word", ["surface", "lemma", "upos", "xpos", "dep"])

nlp = spacy.load("ja_core_news_sm", disable=["ner", "entity_linker"])

for doc in nlp.pipe(sentences, batch_size=150, n_process=2):
   for token in doc:
      word = Word(surface=token.text, lemma=token.lemma_, upos=token.pos_, xpos=token.tag_, dep=token.dep_)

and get this error.

word = Word(surface=token.text, lemma=token.lemma_, upos=token.pos_, xpos=token.tag_, dep=token.dep_)
  File "token.pyx", line 894, in spacy.tokens.token.Token.lemma_.__get__
  File "strings.pyx", line 136, in spacy.strings.StringStore.__getitem__
 KeyError: "[E018] Can't retrieve string for hash '17260935250788936050'. This usually refers to an issue with the `Vocab` or `StringStore`."

The failing token wasn’t the first one in the sentence, so I counted the number of tokens throwing Exceptions and in one collection of sentences I have, 397/53597 iterated tokens cause an Exception (so far the number of failures has stayed constant on re-runs varying batch_size and n_process).

Just to sanity check, a bare nlp.pipe() or [nlp(s) for s in sentences] work with no issues. Possibly a model-specific issue?

Your Environment

  • Operating System: Windows 10 (Build 18363)
  • Python Version Used: 3.8.1
  • spaCy Version Used: 2.3.0
  • Environment Information: WSL (Linux-4.4.0-18362-Microsoft-x86_64-with-glibc2.27)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
adrianeboydcommented, Nov 6, 2020

Thanks for the report, that does look like a related bug. I’ll look into it!

0reactions
github-actions[bot]commented, Oct 30, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

spaCy nlp.pipe error with multiprocessing (n_process > 1 ...
I'm trying to use spacy-langdetect to add a language detection feature in my spaCy NLP pipeline. Everything looks good when I use a...
Read more >
E018 when fine-tuning parser - solved - Prodigy Support
I am trying to fine-tune the parser component using ... "KeyError: "[E018] Can't retrieve string for hash '14000015214052600094'.
Read more >
spaCy Tutorial – Complete Writeup - Machine Learning Plus
This tutorial is a complete guide to learn how to use spaCy for various tasks. Overview. 1. Introduction The Doc object 2. Tokenization...
Read more >
Spacy.Matcher.Phrasematcher Object Has No Attribute ...
Current Behaviour: KeyError: [E018] Can't retrieve string for hash ... The most straightforward solution is to use the same nlp pipeline also work...
Read more >
How to Develop a Deep Learning Photo Caption Generator ...
A good dataset to use when getting started with image captioning is ... One measure that can be used to evaluate the skill...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found