when training the masked LM, the unmasked words (have label 0) were trained together with masked words?
See original GitHub issueIssue Description
According to the code
def random_word(self, sentence):
tokens = sentence.split()
output_label = []
for i, token in enumerate(tokens):
prob = random.random()
if prob < 0.15:
# 80% randomly change token to make token
if prob < prob * 0.8:
tokens[i] = self.vocab.mask_index
# 10% randomly change token to random token
elif prob * 0.8 <= prob < prob * 0.9:
tokens[i] = random.randrange(len(self.vocab))
# 10% randomly change token to current token
elif prob >= prob * 0.9:
tokens[i] = self.vocab.stoi.get(token, self.vocab.unk_index)
output_label.append(self.vocab.stoi.get(token, self.vocab.unk_index))
else:
tokens[i] = self.vocab.stoi.get(token, self.vocab.unk_index)
output_label.append(0)
return tokens, output_label
Do we need to exclude the unmasked words when training the LM?
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
bert-large-uncased-whole-word-masking - Hugging Face
The training is identical -- each masked WordPiece token is ... This means it was pretrained on the raw texts only, with no...
Read more >Masked-Language Modeling With BERT | by James Briggs
The BERT paper uses a 15% probability of masking each token during model pre-training, with a few additional rules — we'll use a...
Read more >STRUCTBERT - OpenReview
resentations, neural language models are designed to define the joint ... new word objective is jointly trained together with the original masked LM ......
Read more >Unmasking BERT: The Key to Transformer Model Performance
This is why we say that MLMs are “bidirectional” since they have access to words that are before and after the current word...
Read more >Time Masking for Temporal Language Models - arXiv
At the heart of the masked language modeling (MLM) approach is the task of predicting ... enables the learning of word representations that...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@coddinglxf that’s what I thought at first, but can’t implement it efficiently as much as GPU computation time. If you have any idea please implement and pull request plez 😃 It would be really cool to do it 👍
@leon-cas yes #36 it’s solved with your question