question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What's the consideration of not apply positional encoding for V at the self-attention layer?

See original GitHub issue

Question: What’s the consideration of not apply positional encoding for V at the self-attention layer?

def forward_post(self, 
                     src, 
                     src_mask: Optional[Tensor] = None,
                     src_key_padding_mask: Optional[Tensor] = None,
                     pos: Optional[Tensor] = None):

        q = k = self.with_pos_embed(src, pos)
        src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,
                              key_padding_mask=src_key_padding_mask)[0]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
szagoruykocommented, Jun 27, 2020

Hi, we follow the standard practice, see Transformer XL or Stand-alone self-attention in vision, in adding positional encoding to queries and keys only, except in our case it is absolute and not relative.

1reaction
alexzeng1206commented, May 23, 2022

+1. I have read the code and find that although the positional encoding is added within q and k during the computation of self/cross-attention, the features are just obtained from v (only contain appearance feature without positional encoding) throughout the process, so I don’t know why the final output slot contains spatial information that allows the FFN to predict the bounding box based on it. Can anyone help to explain where such spatial information comes from?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Self-Attention and Positional Encoding - mxnet - D2L Discussion
Hey @sprajagopal, great question! First just for clarification, Q,K and V don't need to be the same. They might be the same in...
Read more >
Relative Positional Encoding - Jake Tae
In this post, we will take a look at relative positional encoding, as introduced in Shaw et al (2018) and refined by Huang...
Read more >
RETHINKING POSITIONAL ENCODING IN LANGUAGE PRE ...
we propose a new positional encoding method called Transformer with Untied. Positional Encoding (TUPE). In the self-attention module, TUPE computes the.
Read more >
Attention Mechanism, Transformers, BERT, and GPT - OSF
self -attention, and attention in different areas ... which do not use any recurrence. We ex- ... the transformer, including positional encoding,.
Read more >
Relative Positional Encoding for Transformers with Linear ...
where φ : RD → RR is a non-linear feature map applied ... As an example of positional encoding in the attention ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found