question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem with handling encoding failure

See original GitHub issue

I noticed that the method _convert_word_to_char_ids found in bilm/data.py can’t handle encoding errors under certain conditions. The problem is in the code chunk below:

        word_encoded = word.encode('utf-8', 'ignore')[:(self.max_word_length-2)]
        code[0] = self.bow_char
        for k, chr_id in enumerate(word_encoded, start=1):
            code[k] = chr_id
        code[k + 1] = self.eow_char

As you can see, if a token consisted in a single character that failed to encode, then the word_encoded variable is going to be an empty string. When this goes into the enumerate for-loop, it exists without initializing the k variable and therefore the last line fails with the following error:

UnboundLocalError: local variable 'k' referenced before assignment

This can be handled with an exception, which could flag the failed token and print a warning. Since I haven’t gone deep into the specifics of the library, I am not sure if this is a proper solution, so I thought I might as well bring this to your attention.

EDIT:

Another thing I have noticed is that empty files in the training data folder would cause the training to fail, once processed; meaning the training could go on for days, only to fail on an empty file. So just to save users the trouble, it would be very kind of you to notify them that empty files will cause a problem, or may be add some logic to safely skip such failures.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6

github_iconTop GitHub Comments

0reactions
qujinqiangcommented, Aug 29, 2018

@FynnYoung thanks !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Encoding Failure - All You Need To Know About - HOME.org
Encoding failures are the inability to recall specific information because of insufficient encoding of the information for storage in long-term memory. This can ......
Read more >
What is encoding failure? - Study.com
Encoding failure refers to a breakdown in the process of getting information in to the cognitive system. When encoding failures occur, the information...
Read more >
Poor memory? You might have an encoding problem
Do you have a poor memory? It could be a sign of a serious condition — or you could be having trouble encoding...
Read more >
Problems with Memory – Psychology - UH Pressbooks
In cases of brain trauma or disease, forgetting may be due to amnesia. Another reason we forget is due to encoding failure. We...
Read more >
Encoding Deficits Impede Word Learning and Memory in ...
Poor recall during encoding and retention trials with better performance on the recognition task is considered a sign of retrieval problems. The children...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found