question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some questions about dropout

See original GitHub issue

Hi again @lucidrains, I just had some quick questions about dropout with the Sinkhorn Transformer, as I was just using my Linformer implementation (which as you know is based off of this repo), but it was overfitting my dataset. Therefore, I just had some quick questions about some dropout and your implementation, and I wanted to ask whether some design choices here were intentional or not:

  1. In the original Transformer, dropout was performed after each sublayer, before the residual connection. I noticed that you only have this after the SinkhornSelfAttention class, but not after the FeedForward class. Is this intentional?
  2. Speaking of the FeedForward class, you insert dropout after the first linear layer. I couldn’t find this anywhere in any literature, were you able to find a reference of why this was effective? I put it into my implementation, and it seems to help, but i just don’t know where this idea came from.
  3. On a similar note, do you know why the dots tensor in the self attention classes are dropped out? Again, I put it in my linformer and it seems to work, but I can’t find a reference to this in the literature.
  4. Finally, the original transformer also dropped out the input tokens, like so (From the SinkhornTransformerLM class):
    def forward(self, x, **kwargs):
        _, t, device = *x.shape, x.device
        assert t <= self.max_seq_len, f'sequence length {t} is greater than maximum sequence length {self.max_seq_len}'

        x = self.to_token_emb(x)
        x = self.axial_pos_emb(x) + x
        """ Dropout would go here"""
        x = self.sinkhorn_transformer(x, **kwargs)
        return self.to_logits(x)

Should they also be dropped out here as well?

I now updated my repo such that all 4 of these dropout possibilities exist. I’ll let you know if this helps overfitting.

Thank you for your time!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
lucidrainscommented, Sep 16, 2020

@tatp22 Nice! Thank you for sharing your experience 😃

1reaction
tatp22commented, Sep 16, 2020

@lucidrains Right now, I’m only working on seq lengths of 2048, and I am planning to scale it to 8096 soonish. To be honest, I did try a (practical) sequence length of 250k+ (with a k of 150), and it did end up competing with baselines, but I am not pursuing these experiments further atm.

From what I have experienced though, practically, the Linformer works very well (with respect to the standard transformer), even if the k is very small. However, one thing that one must watch out for is that the parameter numbers can seriously explode, especially with longer sequences (compared to standard attention, and even with this repo).

Personally, I have a feeling that attention in general does not need to be quadratic (in time and space), and there may just be better architectures that one can use that are faster and more memory efficient. Unfortunately, I am not really in a position to investigate this at the moment, due to time limitations

Read more comments on GitHub >

github_iconTop Results From Across the Web

NCSET Topic on Dropout and Graduation: Frequently Asked ...
Frequently Asked Questions · How serious is the dropout problem? · What risks do dropouts face? · How are dropout rates measured? ·...
Read more >
High School Dropout Questionnaire & Sample Survey Template
This free high school dropout questionnaire can be used in your surveys to help collect demographic information, as well as understand the reasons...
Read more >
Dropout Prevention Frequently Asked Questions
When do most students drop out? How do we hold districts and schools accountable for their dropout rates? How do repeat dropouts affect...
Read more >
Student Dropouts - Science topic - ResearchGate
Student Dropouts are individuals who leave school, secondary or college, prior to completion of specified curriculum requirements.
Read more >
Dropout Questions.pdf - San Juan Unified School District
1. Analyze why “Only 7 of 10 ninth graders today will get high school diplomas” (Levin and Rouse 3). Why are high school...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found