question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: pass positions of sequence items in AttentionLayers forward pass

See original GitHub issue

In some applications it is desirable to have sequence items referenced by arbitrary positions instead of ascending ordinals (torch.arange(max_seq_len...)).

Would it be possible to add a new parameter pos to the forward pass of AttentionLayers?

I am not sure whether all positional encodings would be compatible but the idea would be something like:

from x_transformers import Decoder, Encoder

enc = Encoder(
    dim = 512,
    depth = 2,
    heads = 8,
    rotary_pos_emb = True,
)

#current behavior is
enc(torch.rand(2,5,512),pos=torch.arange(5).unsqueeze(0))

#arbitrary positions possible
enc(torch.rand(2,5,512),pos=torch.Float([[0,1,2,3,4.1],[-1.5,0,2,3.5,4]]))

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
antorsaecommented, Aug 1, 2022

Yes, but for AttentionLayers (not just TransformerWrapper as in the PR).

Also, this would be more attractive if it worked with the other more performant (relative) positional embeddings/encodings (T5, rotary, etc.).

0reactions
lucidrainscommented, Aug 3, 2022

@antorsae or do you mean derive the relative positional encoding based on the absolute positions of the tokens passed in on forwards?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Write your own custom Attention layer: Easy, intuitive guide
Craft your own Attention layer in 6 lines — Story of how the code evolved ... Pass this through a bi-directional LSTM of...
Read more >
Tutorial 6: Transformers and Multi-Head Attention
Remember that the Multi-Head Attention layer ignores the position of elements in a sequence, and can only learn it based on the input...
Read more >
CLIP - Hugging Face
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. CLIP uses a...
Read more >
The Transformer Model - MachineLearningMastery.com
Self-attention layers were found to be faster than recurrent layers for shorter sequence lengths and can be restricted to consider only a ...
Read more >
S18 Sequence to Sequence models: Attention Models
Ivan Bilan: Understanding and Applying Self- Attention for NLP | PyData Berlin 2018. PyData. PyData. •. •. 46K views 4 years ago ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found