Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error about the mask in ScaledDotProductAttention

See original GitHub issue

Currently, the attention mask in the ScaledDotProductAttention is generated in Line 28 in Models.py by: pad_attn_mask = seq_k.data.eq(Constants.PAD).unsqueeze(1) pad_attn_mask = pad_attn_mask.expand(mb_size, len_q, len_k)

Ignoring the batch dimension for an explanation, I assume the generated pad_attn_mask is a matrix of shape (len_q * len_k), then this code will produce the matrix like [A 1], where 1 is an all one submatrix. However, I think the generated attention mask should be like [B 1 // 1 1], where 1 is an all one submatrix and // means line break (sorry I don’t know how to type formula in Markdown environments).

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

yangze0930commented, Aug 27, 2018

@jadore801120 @seasa2016 Thanks for your clarification, now I understand it.

0reactions

seasa2016commented, Aug 26, 2018

I think it might not be the problem since the embedding of the padding word is zero for all dimension. Thus it is not important whether it is mask or not.

Top Results From Across the Web

xformers/scaled_dot_product.py at main - attention - GitHub

Implementing the Scaled Dot-Product attention proposed in ... att_mask A 2D or 3D mask which ignores attention at certain positions. ... logger.error(.

How to Implement Scaled Dot-Product Attention from Scratch ...

You may note that the scaled dot-product attention can also apply a mask to the attention scores before feeding them into the softmax...

L19.4.2 Self-Attention and Scaled Dot-Product Attention

L19.4.2 Self-Attention and Scaled Dot-Product Attention. 7.2K views 1 year ago Intro to Deep Learning and Generative Models Course.

Bruce Willis admits 'error of judgement' over face mask - BBC

The Die Hard actor was reportedly asked to leave an LA store for refusing to wear a face covering.

Spelling Error Correction with Soft-Masked BERT

Our method of using `Soft-Masked BERT' is general, and it may be employed in other language detection-correction problems. Experimental results on two ...