How does Dropout2d help in cloze task?
See original GitHub issueclass ClfHead(nn.Module):
""" Classifier Head for the transformer """
def __init__(self, clf_token, cfg):
super(ClfHead, self).__init__()
self.n_embd = cfg.n_embd
self.clf_token = clf_token
self.dropout = nn.Dropout2d(cfg.clf_pdrop) # To reproduce the noise_shape parameter of TF implementation
self.linear = nn.Linear(cfg.n_embd, 1)
nn.init.normal_(self.linear.weight, std=0.02)
nn.init.normal_(self.linear.bias, 0)
def forward(self, h, x):
# Classification logits
clf_h = h.view(-1, self.n_embd)
flat = x[:, :, :, 0].contiguous().view(-1)
clf_h = clf_h[flat == self.clf_token, :]
clf_h = clf_h.view(-1, x.size(1), self.n_embd, 1)
clf_h = self.dropout(clf_h)
clf_h = clf_h.view(-1, self.n_embd)
clf_logits = self.linear(clf_h)
return clf_logits.view(-1, x.size(1))
Here the self.dropout(clf_h)
essentially removes the representation of a sentence and its conclusion, there is remote chance (0.2*0.2) that both representations get removed for a given data item. I am confused on how this aids training .
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
Dropout2d — PyTorch 1.13 documentation
Dropout2d() will help promote independence between feature maps and should be used instead. Parameters: p (float, optional) – probability of an element to...
Read more >Intelligent Computing - DOKUMEN.PUB
This book is a comprehensive collection of chapters focusing on the core areas ... good results can be achieved on this task with...
Read more >animal no testing机构- CSDN
Note: The problem is a prediction challenge that aims at helping the Company to build an optimal blend of quantitative strategies, given a...
Read more >PyTorch implementation of OpenAI's Finetuned Transformer ...
How does Dropout2d help in cloze task ? class ClfHead(nn.Module): ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Instead of nn.Dropout we can create a mask using torch.bernoulli and apply it to both the sentences with broadcasted multiply? https://discuss.pytorch.org/t/how-to-fix-the-dropout-mask-for-different-batch
We have to make sure that scaling is done, and its not applied in eval mode .
I created the pull request for this fix.
I ran a few trainings just to be sure that it is working and I get the following results:
Seed 42
Seed 43
Seed 44