AssertionError in KB.load_bulk
See original GitHub issueI generated entities and aliases file using Wikipedia dump and loaded them using KB. I saved it using .dump() but when I load it again using load_bulk() it throws this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "kb.pyx", line 356, in spacy.kb.KnowledgeBase.load_bulk
File "kb.pyx", line 409, in spacy.kb.KnowledgeBase.load_bulk
AssertionError
I went through the code and saw it was giving assertion error that the no of entities loaded were not same as kb.get_size_entities(). But I don’t understand why so, I am not doing anything beyond kb.dump and kb.load_bulk.
Help would be appreciated. Thanks!
Environment
- Operating System: Ubuntu 18.04.3 LTS
- Python Version Used: 3.8.3
- spaCy Version Used: 2.3.2
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Decimal$DecimalIsFractional assertion error - Azure Databricks
Using round() or casing a double to decimal results in a Decimal$DecimalIsFractional assertion error. java.lang.AssertionError assertion failed.
Read more >Decimal$DecimalIsFractional assertion error - Databricks
You are running a job on Databricks Runtime 7.x or above when you get a java.lang.AssertionError: assertion failed: Decimal$DecimalIsFractional ...
Read more >AssertionError (Java Platform SE 8 ) - Oracle Help Center
Constructs an AssertionError with its detail message derived from the specified boolean , which is converted to a string as defined in section...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry, I didn’t updated the issue but I have already solved the issue.
FYI: I did a binary search to find out which entity was giving an issue. It was an entity having “id” as empty string(“”). After removing that the code worked.
Thanks for the help. Closing Issue.
Thanks for the reply. Here are more details about my code:
The KB is initialized like this:
I have one entities.jsonl file (It has 3968080 entries). Code to add the entities in the KB is:
After running this code, if I do:
len(kb)
, I get 3554993 as result.I have one aliases file(It has 2095576 entries). Code to add them into KB is:
After this when I run:
kb.get_size_aliases()
, I get 305379 [I expect that lot of aliases might not be related to the entities I am storing]After this I save the KB using:
kb.dump("/kb")
And then load is using the command:
kb.load_bulk("/kb")
which throws exact the same error mentioned before i.e.