question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error about the mask in ScaledDotProductAttention

See original GitHub issue

Currently, the attention mask in the ScaledDotProductAttention is generated in Line 28 in Models.py by: pad_attn_mask = seq_k.data.eq(Constants.PAD).unsqueeze(1) pad_attn_mask = pad_attn_mask.expand(mb_size, len_q, len_k)

Ignoring the batch dimension for an explanation, I assume the generated pad_attn_mask is a matrix of shape (len_q * len_k), then this code will produce the matrix like [A 1], where 1 is an all one submatrix. However, I think the generated attention mask should be like [B 1 // 1 1], where 1 is an all one submatrix and // means line break (sorry I don’t know how to type formula in Markdown environments).

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
yangze0930commented, Aug 27, 2018

@jadore801120 @seasa2016 Thanks for your clarification, now I understand it.

0reactions
seasa2016commented, Aug 26, 2018

I think it might not be the problem since the embedding of the padding word is zero for all dimension. Thus it is not important whether it is mask or not.

Read more comments on GitHub >

github_iconTop Results From Across the Web

xformers/scaled_dot_product.py at main - attention - GitHub
Implementing the Scaled Dot-Product attention proposed in ... att_mask A 2D or 3D mask which ignores attention at certain positions. ... logger.error(.
Read more >
How to Implement Scaled Dot-Product Attention from Scratch ...
You may note that the scaled dot-product attention can also apply a mask to the attention scores before feeding them into the softmax...
Read more >
L19.4.2 Self-Attention and Scaled Dot-Product Attention
L19.4.2 Self-Attention and Scaled Dot-Product Attention. 7.2K views 1 year ago Intro to Deep Learning and Generative Models Course.
Read more >
Bruce Willis admits 'error of judgement' over face mask - BBC
The Die Hard actor was reportedly asked to leave an LA store for refusing to wear a face covering.
Read more >
Spelling Error Correction with Soft-Masked BERT
Our method of using `Soft-Masked BERT' is general, and it may be employed in other language detection-correction problems. Experimental results on two ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found