Error about the mask in ScaledDotProductAttention
See original GitHub issueCurrently, the attention mask in the ScaledDotProductAttention is generated in Line 28 in Models.py by:
pad_attn_mask = seq_k.data.eq(Constants.PAD).unsqueeze(1)
pad_attn_mask = pad_attn_mask.expand(mb_size, len_q, len_k)
Ignoring the batch dimension for an explanation, I assume the generated pad_attn_mask is a matrix of shape (len_q * len_k), then this code will produce the matrix like [A 1], where 1 is an all one submatrix. However, I think the generated attention mask should be like [B 1 // 1 1], where 1 is an all one submatrix and // means line break (sorry I don’t know how to type formula in Markdown environments).
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:5 (1 by maintainers)
Top Results From Across the Web
xformers/scaled_dot_product.py at main - attention - GitHub
Implementing the Scaled Dot-Product attention proposed in ... att_mask A 2D or 3D mask which ignores attention at certain positions. ... logger.error(.
Read more >How to Implement Scaled Dot-Product Attention from Scratch ...
You may note that the scaled dot-product attention can also apply a mask to the attention scores before feeding them into the softmax...
Read more >L19.4.2 Self-Attention and Scaled Dot-Product Attention
L19.4.2 Self-Attention and Scaled Dot-Product Attention. 7.2K views 1 year ago Intro to Deep Learning and Generative Models Course.
Read more >Bruce Willis admits 'error of judgement' over face mask - BBC
The Die Hard actor was reportedly asked to leave an LA store for refusing to wear a face covering.
Read more >Spelling Error Correction with Soft-Masked BERT
Our method of using `Soft-Masked BERT' is general, and it may be employed in other language detection-correction problems. Experimental results on two ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@jadore801120 @seasa2016 Thanks for your clarification, now I understand it.
I think it might not be the problem since the embedding of the padding word is zero for all dimension. Thus it is not important whether it is mask or not.