question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Request for help for LSHSelfAttention()

See original GitHub issue

Hi @lucidrains thank you for your excellent work (I star it).

I am trying to use the LSHSelfAttention() layer in my network instead of my transformer encoder layer.

A pseudocode of what I am doing is that:

word_embeddings = word_embeddings(input)  # batch, seq_len, emb_dim
lsh_encoded = self.lsh_self_attention(word_embeddings)

I continuously get a vector of NaN values, to avoid it I decrease my learning rate from 1e-3 to 1e-5, but nothing is changed.

  1. Am I using the correct layer?
  2. Should I use Reformer() instead of LSHSelfAttention()? I tried to use Reformer() but I also get an error there, which tells me that my sequence is divisible by the number of buckets (I’m still working on it).

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:22 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
lucidrainscommented, Dec 11, 2020

Woohoo! Congrats 💯

1reaction
andreabac3commented, Dec 10, 2020

@lucidrains I am using pytorch_lightning==0.8.5 so I suppose, looking in the default parameter in Trainer() class, it’s set as O2

amp_level: str = 'O2',   # backward compatible, todo: remove in v1.0.0

which is located in $HOME/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py

PS: It’s works with lr = 1e-3 without problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Illustrated: Self-Attention
This article walks you through the mathematical operations in a self-attention module. Includes illustrations and code.
Read more >
Complete Self-Attention from Scratch
Complete Self-Attention from Scratch. This vignette describes how to implement the attention mechanism - which forms the basis of transformers - in the...
Read more >
tf.keras.layers.Attention | TensorFlow v2.11.0
Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout). use_causal_mask ...
Read more >
Attention? Attention! - Lil'Log
Self -attention, also known as intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a ...
Read more >
The Transformer Attention Mechanism
… · Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found