question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AssertionError in KB.load_bulk

See original GitHub issue

I generated entities and aliases file using Wikipedia dump and loaded them using KB. I saved it using .dump() but when I load it again using load_bulk() it throws this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "kb.pyx", line 356, in spacy.kb.KnowledgeBase.load_bulk
  File "kb.pyx", line 409, in spacy.kb.KnowledgeBase.load_bulk
AssertionError

I went through the code and saw it was giving assertion error that the no of entities loaded were not same as kb.get_size_entities(). But I don’t understand why so, I am not doing anything beyond kb.dump and kb.load_bulk.

Help would be appreciated. Thanks!

Environment

  • Operating System: Ubuntu 18.04.3 LTS
  • Python Version Used: 3.8.3
  • spaCy Version Used: 2.3.2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
nlp-sudocommented, Oct 22, 2020

Sorry, I didn’t updated the issue but I have already solved the issue.

FYI: I did a binary search to find out which entity was giving an issue. It was an entity having “id” as empty string(“”). After removing that the code worked.

Thanks for the help. Closing Issue.

1reaction
nlp-sudocommented, Sep 16, 2020

Thanks for the reply. Here are more details about my code:

The KB is initialized like this:

nlp = spacy.load("en_core_web_lg")  
kb = KnowledgeBase(vocab=nlp.vocab, entity_vector_length=300)

I have one entities.jsonl file (It has 3968080 entries). Code to add the entities in the KB is:

for x in entities:
	if not kb.contains_entity(x["id"]):
		kb.add_entity(id, 100, nlp.make_doc(x["description"]).vector)

After running this code, if I do: len(kb), I get 3554993 as result.

I have one aliases file(It has 2095576 entries). Code to add them into KB is:

for a in aliases:
    ents = []
    prob = []
    for i in range(len(a["entities"])):
        if kb.contains_entity(a["entities"][i]):
            ents.append(a["entities"][i])
            prob.append(a["probabilities"][i])
    n_ents = len(ents)
    if n_ents > 0:
        s = sum(prob)
        if s== 0:
            prior_prob = [1.0 / n_ents] * n_ents
        else:
            prior_prob = [x / s for x in prob]
        kb.add_alias(alias=a["alias"], entities=ents, probabilities=prior_prob)

After this when I run: kb.get_size_aliases(), I get 305379 [I expect that lot of aliases might not be related to the entities I am storing]

After this I save the KB using: kb.dump("/kb")

And then load is using the command: kb.load_bulk("/kb")

which throws exact the same error mentioned before i.e.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "kb.pyx", line 356, in spacy.kb.KnowledgeBase.load_bulk
  File "kb.pyx", line 409, in spacy.kb.KnowledgeBase.load_bulk
AssertionError
Read more comments on GitHub >

github_iconTop Results From Across the Web

Decimal$DecimalIsFractional assertion error - Azure Databricks
Using round() or casing a double to decimal results in a Decimal$DecimalIsFractional assertion error. java.lang.AssertionError assertion failed.
Read more >
Decimal$DecimalIsFractional assertion error - Databricks
You are running a job on Databricks Runtime 7.x or above when you get a java.lang.AssertionError: assertion failed: Decimal$DecimalIsFractional ...
Read more >
AssertionError (Java Platform SE 8 ) - Oracle Help Center
Constructs an AssertionError with its detail message derived from the specified boolean , which is converted to a string as defined in section...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found