Feature request: pass positions of sequence items in AttentionLayers forward pass
See original GitHub issueIn some applications it is desirable to have sequence items referenced by arbitrary positions instead of ascending ordinals (torch.arange(max_seq_len...)
).
Would it be possible to add a new parameter pos
to the forward pass of AttentionLayers
?
I am not sure whether all positional encodings would be compatible but the idea would be something like:
from x_transformers import Decoder, Encoder
enc = Encoder(
dim = 512,
depth = 2,
heads = 8,
rotary_pos_emb = True,
)
#current behavior is
enc(torch.rand(2,5,512),pos=torch.arange(5).unsqueeze(0))
#arbitrary positions possible
enc(torch.rand(2,5,512),pos=torch.Float([[0,1,2,3,4.1],[-1.5,0,2,3.5,4]]))
Issue Analytics
- State:
- Created a year ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Write your own custom Attention layer: Easy, intuitive guide
Craft your own Attention layer in 6 lines — Story of how the code evolved ... Pass this through a bi-directional LSTM of...
Read more >Tutorial 6: Transformers and Multi-Head Attention
Remember that the Multi-Head Attention layer ignores the position of elements in a sequence, and can only learn it based on the input...
Read more >CLIP - Hugging Face
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. CLIP uses a...
Read more >The Transformer Model - MachineLearningMastery.com
Self-attention layers were found to be faster than recurrent layers for shorter sequence lengths and can be restricted to consider only a ...
Read more >S18 Sequence to Sequence models: Attention Models
Ivan Bilan: Understanding and Applying Self- Attention for NLP | PyData Berlin 2018. PyData. PyData. •. •. 46K views 4 years ago ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, but for
AttentionLayers
(not justTransformerWrapper
as in the PR).Also, this would be more attractive if it worked with the other more performant (relative) positional embeddings/encodings (T5, rotary, etc.).
@antorsae or do you mean derive the relative positional encoding based on the absolute positions of the tokens passed in on forwards?