question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deserialization fails based on whether "nlp" object was used yet

See original GitHub issue

When I try to deserialize using a fresh nlp object, spaCy crashes, though it manages to deserialize fine if it’s the same nlp object that was used to originally parse the text. I can’t tell if I’m using the library in a way it’s not intended to be used, or if this is a bug? (This is spaCy version 1.5.0 on Linux with Python 2.7)

from spacy.tokens.doc import Doc
import spacy
nlp = spacy.load("en")
text = u"Hello world."

# Parse it. Works.
doc = nlp(text)
print len(set([o for o in doc]))

# Save a serialized copy
with open("out",'w') as f:
    f.write(str(doc.to_bytes()))

# Deserialize: works
doc2 = Doc(nlp.vocab)
doc2.from_bytes(open("out").read())
print len(set([o for o in doc2]))

# Deserialize with a fresh nlp object: crashes
nlp = spacy.load("en")
doc3 = Doc(nlp.vocab)
doc3.from_bytes(open("out").read())
print len(set([o for o in doc3]))

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
brendanocommented, Jan 10, 2017

Oh now it works! I reinstalled the model and now it has both these two directories in spacy/data:

en-1.1.0
en_glove_cc_300_1m_vectors-1.0.0

Before, it only had en-1.1.0 after I installed the model.

0reactions
lock[bot]commented, May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deserialization fails with a new model instance. #927 - GitHub
The problem is that, if documents contain unicode characters, it seems that retrieving them later and then deserializing them with 'from_bytes' ...
Read more >
java - Jackson Deserialization Fails because of non-default ...
A delegate creator allows Jackson to deserialize json for one type of object into another type of Java object. In this case, because...
Read more >
Pyspark error "Could not serialize object" - Clare S. Y. Huang
The issue is that, as self._mapping appears in the function addition , when applying addition_udf to the pyspark dataframe, the object self ( ......
Read more >
Saving and Loading · spaCy Usage Documentation
When an nlp object with the component in its pipeline is saved or loaded, the component will then be able to serialize and...
Read more >
Introducing spaCy v2.2 - Explosion AI
Version 2.2 of the spaCy Natural Language Processing library is leaner, cleaner and even more user-friendly. In addition to new model ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found