question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Workaround for spacy.en.English() load time?

See original GitHub issue

Our file includes the folloiwing

from spacy.en import English
nlp = English()

The English constructor takes quite some time, depending on the machine.

Is there some workaround to speed it up, or something we’re doing wrong?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
honnibalcommented, Feb 1, 2016

Solved!

On my laptop models now load in 13s, down from 90s.

This turned out to be a stupid problem =/. At some point in the many revisions of this code, I lost an important patch: before loading the model, I wasn’t resizing the hash table! If the hash table is sized exactly, insertions are sequential. But resizing is very expensive, because we use open addressing and linear probing. When we resize the hash table, all keys must be reinserted!

There are 9 million entries in the table, so this is very expensive.

I feel stupid for not realising the loading time was extreme and there must be a problem. But, the important thing is: this will be fixed in the next version.

1reaction
honnibalcommented, Jan 7, 2016

I’d love to have more insight into why it takes so long, and what the variance is due to. If you do any benchmarking, let me know! The part that loads the parser model is here:

https://github.com/honnibal/thinc/blob/master/thinc/model.pyx#L89

What we’re doing is looping over successive calls to Reader.read:

https://github.com/honnibal/thinc/blob/master/thinc/model.pyx#L155

The memory is being allocated via this cymem.Pool class:

https://github.com/honnibal/cymem/blob/master/cymem/cymem.pyx#L31

The model.load code is called by spaCy here:

https://github.com/honnibal/spaCy/blob/master/spacy/syntax/parser.pyx#L85

You could verify that this part is indeed the slow part for you by loading nlp = English(parser=False).

It’d be good to know whether it’s indeed the disk reads that are slow, or whether it’s something else that we could do more about, like the hash table insertions or the memory allocations.

But, don’t spend too long on it 😃. As I said, I hope to be replacing this soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spacy english language model take too long to load
This is really slow because it loads the model for every sentence: import spacy def dostuff(text): nlp = spacy.load("en") return nlp(text).
Read more >
spaCy 101: Everything you need to know
Once you've downloaded and installed a trained pipeline, you can load it via spacy.load . This will return a Language object containing all...
Read more >
How to use the spacy.load function in spacy
import spacy import warnings ''' To use the more accurate but slower model use "en_core_web_lg" otherwise use "en_core_web_sm" ''' nlp = spacy.load(" ...
Read more >
Natural Language Processing With spaCy in Python
Here, the nlp object is a language model instance. You can assume that, throughout this tutorial, nlp refers to the language model loaded...
Read more >
pip install spacy==1.3.0
Currently only models for English and German, named en and de, ... Fix issue #617: Vocab.load() now works with string paths, as well...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found