Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How does Dropout2d help in cloze task?

See original GitHub issue

class ClfHead(nn.Module):
    """ Classifier Head for the transformer """

    def __init__(self, clf_token, cfg):
        super(ClfHead, self).__init__()
        self.n_embd = cfg.n_embd
        self.clf_token = clf_token
        self.dropout = nn.Dropout2d(cfg.clf_pdrop)  # To reproduce the noise_shape parameter of TF implementation
        self.linear = nn.Linear(cfg.n_embd, 1)
        nn.init.normal_(self.linear.weight, std=0.02)
        nn.init.normal_(self.linear.bias, 0)

    def forward(self, h, x):
        # Classification logits
        clf_h = h.view(-1, self.n_embd)
        flat = x[:, :, :, 0].contiguous().view(-1)
        clf_h = clf_h[flat == self.clf_token, :]
        clf_h = clf_h.view(-1, x.size(1), self.n_embd, 1)
        clf_h = self.dropout(clf_h)
        clf_h = clf_h.view(-1, self.n_embd)
        clf_logits = self.linear(clf_h)
        return clf_logits.view(-1, x.size(1))

Here the self.dropout(clf_h) essentially removes the representation of a sentence and its conclusion, there is remote chance (0.2*0.2) that both representations get removed for a given data item. I am confused on how this aids training .

Issue Analytics

State:
Created 5 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

sai-prasannacommented, Jul 6, 2018

Instead of nn.Dropout we can create a mask using torch.bernoulli and apply it to both the sentences with broadcasted multiply? https://discuss.pytorch.org/t/how-to-fix-the-dropout-mask-for-different-batch

We have to make sure that scaling is done, and its not applied in eval mode .

0reactions

rodgzillacommented, Jul 12, 2018

I created the pull request for this fix.

I ran a few trainings just to be sure that it is working and I get the following results:

Seed 42

device cuda n_gpu 1
Encoding dataset...
Loading weights...                                                              
running epoch 0
Logging                                                                         
1 187 4.358 7.397 91.18 84.49                                                   
running epoch 1
Logging                                                                         
2 374 0.807 8.412 99.20 90.37                                                   
running epoch 2
Logging                                                                         
3 561 0.000 20.528 100.00 90.11                                                 
ROCStories Valid Accuracy: 90.37                                                
ROCStories Test Accuracy:  87.17

Seed 43

device cuda n_gpu 1
Encoding dataset...
Loading weights...                                                              
running epoch 0
Logging                                                                         
1 187 1.390 8.253 96.52 89.30                                                   
running epoch 1
Logging                                                                         
2 374 0.098 13.438 99.73 90.91                                                  
running epoch 2
Logging                                                                         
3 561 0.000 16.577 100.00 91.18                                                 
ROCStories Valid Accuracy: 91.18                                                
ROCStories Test Accuracy:  87.17

Seed 44

device cuda n_gpu 1
Encoding dataset...
Loading weights...                                                              
running epoch 0
Logging                                                                         
1 187 3.236 7.552 91.18 83.69                                                   
running epoch 1
Logging                                                                         
2 374 1.036 12.025 98.66 86.36                                                  
running epoch 2
Logging                                                                         
3 561 0.055 17.220 99.73 86.90                                                  
ROCStories Valid Accuracy: 86.90                                                
ROCStories Test Accuracy:  84.66

Top Results From Across the Web

Dropout2d — PyTorch 1.13 documentation

Dropout2d() will help promote independence between feature maps and should be used instead. Parameters: p (float, optional) – probability of an element to...

Intelligent Computing - DOKUMEN.PUB

This book is a comprehensive collection of chapters focusing on the core areas ... good results can be achieved on this task with...

animal no testing机构- CSDN

Note: The problem is a prediction challenge that aims at helping the Company to build an optimal blend of quantitative strategies, given a...

PyTorch implementation of OpenAI's Finetuned Transformer ...

How does Dropout2d help in cloze task ? class ClfHead(nn.Module): ...