Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem with handling encoding failure

See original GitHub issue

I noticed that the method _convert_word_to_char_ids found in bilm/data.py can’t handle encoding errors under certain conditions. The problem is in the code chunk below:

        word_encoded = word.encode('utf-8', 'ignore')[:(self.max_word_length-2)]
        code[0] = self.bow_char
        for k, chr_id in enumerate(word_encoded, start=1):
            code[k] = chr_id
        code[k + 1] = self.eow_char

As you can see, if a token consisted in a single character that failed to encode, then the word_encoded variable is going to be an empty string. When this goes into the enumerate for-loop, it exists without initializing the k variable and therefore the last line fails with the following error:

UnboundLocalError: local variable 'k' referenced before assignment

This can be handled with an exception, which could flag the failed token and print a warning. Since I haven’t gone deep into the specifics of the library, I am not sure if this is a proper solution, so I thought I might as well bring this to your attention.

EDIT:

Another thing I have noticed is that empty files in the training data folder would cause the training to fail, once processed; meaning the training could go on for days, only to fail on an empty file. So just to save users the trouble, it would be very kind of you to notify them that empty files will cause a problem, or may be add some logic to safely skip such failures.

Issue Analytics

State:
Created 5 years ago
Comments:6

Top GitHub Comments

1reaction

FynnYoungcommented, Aug 29, 2018

@qujinqiang This blog may be helpful for you. http://www.linzehui.me/2018/08/12/碎片知识点/如何将ELMo词向量用于中文/

0reactions

qujinqiangcommented, Aug 29, 2018

@FynnYoung thanks !

Top Results From Across the Web

Encoding Failure - All You Need To Know About - HOME.org

Encoding failures are the inability to recall specific information because of insufficient encoding of the information for storage in long-term memory. This can ......

What is encoding failure? - Study.com

Encoding failure refers to a breakdown in the process of getting information in to the cognitive system. When encoding failures occur, the information...

Poor memory? You might have an encoding problem

Do you have a poor memory? It could be a sign of a serious condition — or you could be having trouble encoding...

Problems with Memory – Psychology - UH Pressbooks

In cases of brain trauma or disease, forgetting may be due to amnesia. Another reason we forget is due to encoding failure. We...

Encoding Deficits Impede Word Learning and Memory in ...

Poor recall during encoding and retention trials with better performance on the recognition task is considered a sign of retrieval problems. The children...