Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Workaround for spacy.en.English() load time?

See original GitHub issue

Our file includes the folloiwing

from spacy.en import English
nlp = English()

The English constructor takes quite some time, depending on the machine.

Is there some workaround to speed it up, or something we’re doing wrong?

Issue Analytics

State:
Created 8 years ago
Comments:12 (8 by maintainers)

Top GitHub Comments

1reaction

honnibalcommented, Feb 1, 2016

Solved!

On my laptop models now load in 13s, down from 90s.

This turned out to be a stupid problem =/. At some point in the many revisions of this code, I lost an important patch: before loading the model, I wasn’t resizing the hash table! If the hash table is sized exactly, insertions are sequential. But resizing is very expensive, because we use open addressing and linear probing. When we resize the hash table, all keys must be reinserted!

There are 9 million entries in the table, so this is very expensive.

I feel stupid for not realising the loading time was extreme and there must be a problem. But, the important thing is: this will be fixed in the next version.

1reaction

honnibalcommented, Jan 7, 2016

I’d love to have more insight into why it takes so long, and what the variance is due to. If you do any benchmarking, let me know! The part that loads the parser model is here:

https://github.com/honnibal/thinc/blob/master/thinc/model.pyx#L89

What we’re doing is looping over successive calls to Reader.read:

https://github.com/honnibal/thinc/blob/master/thinc/model.pyx#L155

The memory is being allocated via this cymem.Pool class:

https://github.com/honnibal/cymem/blob/master/cymem/cymem.pyx#L31

The model.load code is called by spaCy here:

https://github.com/honnibal/spaCy/blob/master/spacy/syntax/parser.pyx#L85

You could verify that this part is indeed the slow part for you by loading nlp = English(parser=False).

It’d be good to know whether it’s indeed the disk reads that are slow, or whether it’s something else that we could do more about, like the hash table insertions or the memory allocations.

But, don’t spend too long on it 😃. As I said, I hope to be replacing this soon.

Top Results From Across the Web

Spacy english language model take too long to load

This is really slow because it loads the model for every sentence: import spacy def dostuff(text): nlp = spacy.load("en") return nlp(text).

spaCy 101: Everything you need to know

Once you've downloaded and installed a trained pipeline, you can load it via spacy.load . This will return a Language object containing all...

How to use the spacy.load function in spacy

import spacy import warnings ''' To use the more accurate but slower model use "en_core_web_lg" otherwise use "en_core_web_sm" ''' nlp = spacy.load(" ...

Natural Language Processing With spaCy in Python

Here, the nlp object is a language model instance. You can assume that, throughout this tutorial, nlp refers to the language model loaded...

pip install spacy==1.3.0

Currently only models for English and German, named en and de, ... Fix issue #617: Vocab.load() now works with string paths, as well...