2.3.0 models don't work as expected
See original GitHub issueHow to reproduce the behaviour
After upgrading spacy and the corresponding models to 2.3.0, the models after loading seem to have very limited vocabulary:
import spacy
nlp = spacy.load('en_core_web_sm') # same with 'en_core_web_md' and 'en_core_web_lg'
len(nlp.vocab) # outputs 478
The output of [w.orth_ for w in nlp.vocab]
(shortened):
['nuthin',
'there',
'ü.',
'’nuff',
'havin',
"'bout",
'’Cause',
'Need',
'Somethin',
'gon',
'N.C.',
'\\n',
' ',
'Sept.',
'c.',
'E.G.',
'Mont.',
'b.',
':-}',
'got',
'it',
'Jr.',
'=3',
'>.>',
'Calif.',
':}',
'Ill.',
"O'clock",
"o'clock",
'Mich.',
'is',
':-o',
'n.',
'w/o',
'Might',
'>.<',
':))',
Lexemes outside of this list can still be accessed via e.g. nlp.vocab['aardvark']
, but any workflow that requires operating on nlp.vocab
is broken. Also, all lexemes have prob
of -20.0
:
nlp.vocab['aardvark'].prob # -20.0
Info about spaCy
- spaCy version: 2.3.0
- Platform: Linux-5.6.15-arch1-1-x86_64-with-arch
- Python version: 3.7.3
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (8 by maintainers)
Top Results From Across the Web
Unable to load Keras model in Keras 2.4.3 (with Tensorflow ...
After some digging, I found that the error is actually caused by saving and then attempting to load on different versions of Keras/Tensorflow....
Read more >Release Notes — Airflow Documentation
In order to make airflow dags test more useful as a testing and debugging tool, we no longer run a backfill job and...
Read more >Release Notes — NVIDIA Riva - NVIDIA Documentation Center
Because Riva uses CTC-based acoustic models, which do not learn alignment during training, word timestamps in ASR transcripts can be inaccurate.
Read more >Image Layer Details - tensorflow/serving:2.3.0 | Docker Hub
/bin/sh -c mkdir -p /run/systemd. 161 B. 5. CMD ["/bin/bash"] ... LABEL tensorflow_serving_github_branchtag=2.3.0 ... ENV MODEL_BASE_PATH=/models.
Read more >Known Issues - Cribl Docs
LogStream 2.3.0 applies a restrictive permissions check using id -un <uid> , which does not work with the version of id that ships...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Install
spacy-lookups-data
:Then you need one extra line if you restructure it slightly to directly iterate over words with vectors rather than the vocab:
If you save this model with
nlp.to_disk()
, the probability table is included and the next time you load it, you can skip the step where you drop the empty table and it doesn’t matter whether spacy-lookups-data is installed.This will be slightly slower than in v2.2. It takes longer to load prob table (although the initial model loading is now much faster, the overall loading time for model + prob table is slightly higher) and it is slightly slower to access an individual
lex.prob
value in the lookup table vs. the v2.2 ones directly stored in lexemes.This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.