Request for help for LSHSelfAttention()
See original GitHub issueHi @lucidrains thank you for your excellent work (I star it).
I am trying to use the LSHSelfAttention() layer in my network instead of my transformer encoder layer.
A pseudocode of what I am doing is that:
word_embeddings = word_embeddings(input) # batch, seq_len, emb_dim
lsh_encoded = self.lsh_self_attention(word_embeddings)
I continuously get a vector of NaN values, to avoid it I decrease my learning rate from 1e-3 to 1e-5, but nothing is changed.
- Am I using the correct layer?
- Should I use Reformer() instead of LSHSelfAttention()? I tried to use Reformer() but I also get an error there, which tells me that my sequence is divisible by the number of buckets (I’m still working on it).
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:22 (13 by maintainers)
Top Results From Across the Web
Illustrated: Self-Attention
This article walks you through the mathematical operations in a self-attention module. Includes illustrations and code.
Read more >Complete Self-Attention from Scratch
Complete Self-Attention from Scratch. This vignette describes how to implement the attention mechanism - which forms the basis of transformers - in the...
Read more >tf.keras.layers.Attention | TensorFlow v2.11.0
Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout). use_causal_mask ...
Read more >Attention? Attention! - Lil'Log
Self -attention, also known as intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a ...
Read more >The Transformer Attention Mechanism
… · Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Woohoo! Congrats 💯
@lucidrains I am using pytorch_lightning==0.8.5 so I suppose, looking in the default parameter in Trainer() class, it’s set as O2
which is located in $HOME/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py
PS: It’s works with lr = 1e-3 without problem.