question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How does Dropout2d help in cloze task?

See original GitHub issue
class ClfHead(nn.Module):
    """ Classifier Head for the transformer """

    def __init__(self, clf_token, cfg):
        super(ClfHead, self).__init__()
        self.n_embd = cfg.n_embd
        self.clf_token = clf_token
        self.dropout = nn.Dropout2d(cfg.clf_pdrop)  # To reproduce the noise_shape parameter of TF implementation
        self.linear = nn.Linear(cfg.n_embd, 1)
        nn.init.normal_(self.linear.weight, std=0.02)
        nn.init.normal_(self.linear.bias, 0)

    def forward(self, h, x):
        # Classification logits
        clf_h = h.view(-1, self.n_embd)
        flat = x[:, :, :, 0].contiguous().view(-1)
        clf_h = clf_h[flat == self.clf_token, :]
        clf_h = clf_h.view(-1, x.size(1), self.n_embd, 1)
        clf_h = self.dropout(clf_h)
        clf_h = clf_h.view(-1, self.n_embd)
        clf_logits = self.linear(clf_h)
        return clf_logits.view(-1, x.size(1))

Here the self.dropout(clf_h) essentially removes the representation of a sentence and its conclusion, there is remote chance (0.2*0.2) that both representations get removed for a given data item. I am confused on how this aids training .

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
sai-prasannacommented, Jul 6, 2018

Instead of nn.Dropout we can create a mask using torch.bernoulli and apply it to both the sentences with broadcasted multiply? https://discuss.pytorch.org/t/how-to-fix-the-dropout-mask-for-different-batch

We have to make sure that scaling is done, and its not applied in eval mode .

0reactions
rodgzillacommented, Jul 12, 2018

I created the pull request for this fix.

I ran a few trainings just to be sure that it is working and I get the following results:

Seed 42

device cuda n_gpu 1
Encoding dataset...
Loading weights...                                                              
running epoch 0
Logging                                                                         
1 187 4.358 7.397 91.18 84.49                                                   
running epoch 1
Logging                                                                         
2 374 0.807 8.412 99.20 90.37                                                   
running epoch 2
Logging                                                                         
3 561 0.000 20.528 100.00 90.11                                                 
ROCStories Valid Accuracy: 90.37                                                
ROCStories Test Accuracy:  87.17

Seed 43

device cuda n_gpu 1
Encoding dataset...
Loading weights...                                                              
running epoch 0
Logging                                                                         
1 187 1.390 8.253 96.52 89.30                                                   
running epoch 1
Logging                                                                         
2 374 0.098 13.438 99.73 90.91                                                  
running epoch 2
Logging                                                                         
3 561 0.000 16.577 100.00 91.18                                                 
ROCStories Valid Accuracy: 91.18                                                
ROCStories Test Accuracy:  87.17

Seed 44

device cuda n_gpu 1
Encoding dataset...
Loading weights...                                                              
running epoch 0
Logging                                                                         
1 187 3.236 7.552 91.18 83.69                                                   
running epoch 1
Logging                                                                         
2 374 1.036 12.025 98.66 86.36                                                  
running epoch 2
Logging                                                                         
3 561 0.055 17.220 99.73 86.90                                                  
ROCStories Valid Accuracy: 86.90                                                
ROCStories Test Accuracy:  84.66
Read more comments on GitHub >

github_iconTop Results From Across the Web

Dropout2d — PyTorch 1.13 documentation
Dropout2d() will help promote independence between feature maps and should be used instead. Parameters: p (float, optional) – probability of an element to...
Read more >
Intelligent Computing - DOKUMEN.PUB
This book is a comprehensive collection of chapters focusing on the core areas ... good results can be achieved on this task with...
Read more >
animal no testing机构- CSDN
Note: The problem is a prediction challenge that aims at helping the Company to build an optimal blend of quantitative strategies, given a...
Read more >
PyTorch implementation of OpenAI's Finetuned Transformer ...
How does Dropout2d help in cloze task ? class ClfHead(nn.Module): ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found