Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: pass positions of sequence items in AttentionLayers forward pass

See original GitHub issue

In some applications it is desirable to have sequence items referenced by arbitrary positions instead of ascending ordinals (torch.arange(max_seq_len...)).

Would it be possible to add a new parameter pos to the forward pass of AttentionLayers?

I am not sure whether all positional encodings would be compatible but the idea would be something like:

from x_transformers import Decoder, Encoder

enc = Encoder(
    dim = 512,
    depth = 2,
    heads = 8,
    rotary_pos_emb = True,
)

#current behavior is
enc(torch.rand(2,5,512),pos=torch.arange(5).unsqueeze(0))

#arbitrary positions possible
enc(torch.rand(2,5,512),pos=torch.Float([[0,1,2,3,4.1],[-1.5,0,2,3.5,4]]))

Issue Analytics

State:
Created a year ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

antorsaecommented, Aug 1, 2022

Yes, but for AttentionLayers (not just TransformerWrapper as in the PR).

Also, this would be more attractive if it worked with the other more performant (relative) positional embeddings/encodings (T5, rotary, etc.).

0reactions

lucidrainscommented, Aug 3, 2022

@antorsae or do you mean derive the relative positional encoding based on the absolute positions of the tokens passed in on forwards?

Top Results From Across the Web

Write your own custom Attention layer: Easy, intuitive guide

Craft your own Attention layer in 6 lines — Story of how the code evolved ... Pass this through a bi-directional LSTM of...

Tutorial 6: Transformers and Multi-Head Attention

Remember that the Multi-Head Attention layer ignores the position of elements in a sequence, and can only learn it based on the input...

CLIP - Hugging Face

CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. CLIP uses a...

The Transformer Model - MachineLearningMastery.com

Self-attention layers were found to be faster than recurrent layers for shorter sequence lengths and can be restricted to consider only a ...

S18 Sequence to Sequence models: Attention Models

Ivan Bilan: Understanding and Applying Self- Attention for NLP | PyData Berlin 2018. PyData. PyData. •. •. 46K views 4 years ago ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Feature request: pass positions of sequence items in AttentionLayers forward pass

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Using different encoders in CLIP

Is it possible to use AutoregressiveWrapper in combination with ContinuousTransformerWrapper