question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

when training the masked LM, the unmasked words (have label 0) were trained together with masked words?

See original GitHub issue

According to the code

    def random_word(self, sentence):
        tokens = sentence.split()
        output_label = []

        for i, token in enumerate(tokens):
            prob = random.random()
            if prob < 0.15:
                # 80% randomly change token to make token
                if prob < prob * 0.8:
                    tokens[i] = self.vocab.mask_index

                # 10% randomly change token to random token
                elif prob * 0.8 <= prob < prob * 0.9:
                    tokens[i] = random.randrange(len(self.vocab))

                # 10% randomly change token to current token
                elif prob >= prob * 0.9:
                    tokens[i] = self.vocab.stoi.get(token, self.vocab.unk_index)

                output_label.append(self.vocab.stoi.get(token, self.vocab.unk_index))

            else:
                tokens[i] = self.vocab.stoi.get(token, self.vocab.unk_index)
                output_label.append(0)

        return tokens, output_label

Do we need to exclude the unmasked words when training the LM?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
codertimocommented, Oct 23, 2018

@coddinglxf that’s what I thought at first, but can’t implement it efficiently as much as GPU computation time. If you have any idea please implement and pull request plez 😃 It would be really cool to do it 👍

0reactions
codertimocommented, Oct 30, 2018

@leon-cas yes #36 it’s solved with your question

Read more comments on GitHub >

github_iconTop Results From Across the Web

bert-large-uncased-whole-word-masking - Hugging Face
The training is identical -- each masked WordPiece token is ... This means it was pretrained on the raw texts only, with no...
Read more >
Masked-Language Modeling With BERT | by James Briggs
The BERT paper uses a 15% probability of masking each token during model pre-training, with a few additional rules — we'll use a...
Read more >
STRUCTBERT - OpenReview
resentations, neural language models are designed to define the joint ... new word objective is jointly trained together with the original masked LM ......
Read more >
Unmasking BERT: The Key to Transformer Model Performance
This is why we say that MLMs are “bidirectional” since they have access to words that are before and after the current word...
Read more >
Time Masking for Temporal Language Models - arXiv
At the heart of the masked language modeling (MLM) approach is the task of predicting ... enables the learning of word representations that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found