What's the consideration of not apply positional encoding for V at the self-attention layer?
See original GitHub issueQuestion: What’s the consideration of not apply positional encoding for V at the self-attention layer?
def forward_post(self,
src,
src_mask: Optional[Tensor] = None,
src_key_padding_mask: Optional[Tensor] = None,
pos: Optional[Tensor] = None):
q = k = self.with_pos_embed(src, pos)
src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,
key_padding_mask=src_key_padding_mask)[0]
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Self-Attention and Positional Encoding - mxnet - D2L Discussion
Hey @sprajagopal, great question! First just for clarification, Q,K and V don't need to be the same. They might be the same in...
Read more >Relative Positional Encoding - Jake Tae
In this post, we will take a look at relative positional encoding, as introduced in Shaw et al (2018) and refined by Huang...
Read more >RETHINKING POSITIONAL ENCODING IN LANGUAGE PRE ...
we propose a new positional encoding method called Transformer with Untied. Positional Encoding (TUPE). In the self-attention module, TUPE computes the.
Read more >Attention Mechanism, Transformers, BERT, and GPT - OSF
self -attention, and attention in different areas ... which do not use any recurrence. We ex- ... the transformer, including positional encoding,.
Read more >Relative Positional Encoding for Transformers with Linear ...
where φ : RD → RR is a non-linear feature map applied ... As an example of positional encoding in the attention ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, we follow the standard practice, see Transformer XL or Stand-alone self-attention in vision, in adding positional encoding to queries and keys only, except in our case it is absolute and not relative.
+1. I have read the code and find that although the positional encoding is added within q and k during the computation of self/cross-attention, the features are just obtained from v (only contain appearance feature without positional encoding) throughout the process, so I don’t know why the final output slot contains spatial information that allows the FFN to predict the bounding box based on it. Can anyone help to explain where such spatial information comes from?