question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why encoder and decoder use "non_pad_mask"?

See original GitHub issue

https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/20f355eb655bad40195ae302b9d8036716be9a23/transformer/Layers.py#L23

I think the non_pad_mask is not necessary, because processing of padding is done by attn_mask. Why is it necessary?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

5reactions
Yonnie1331commented, Apr 24, 2019

I think it’s because “get_attn_key_pad_mask” doesn’t pad necessary places completely, so “get_non_pad_mask” is masking the parts left out.

Example: This is a sentence: [I, love, Github, PAD, PAD, PAD] After “padding_mask = seq_k.eq(Constants.PAD)”, this sentence would turn into [0 0 0 1 1] After “padding_mask = padding_mask.unsqueeze(1).expand(-1, len_q, -1)”, it turns into 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 (mask1)

However, the mask should be actually 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 (mask2)

Mask1 is used in DotProductAttention: we get the enc_output from “enc_output, enc_slf_attn = self.slf_attn(enc_input, enc_input, enc_input, mask=slf_attn_mask)”. However, mask2 is what should be used.

Then, let’s see “get_non_pad”: Same sentence, after “seq.ne(Constants.PAD).type(torch.float).unsqueeze(-1)”, it turns into: 1 1 1 0 0

So, this is what “enc_output *= non_pad_mask” do. After this dot product, the positions left in mask1 was finally masked, and the length of sentence returns to its true length before padding.

1reaction
PkuDavidGuancommented, Feb 18, 2019

You’re right. I don’t think there are any problems without non_pad_mask. May the author have a better reply. @jadore801120

Read more comments on GitHub >

github_iconTop Results From Across the Web

Transformer-based Encoder-Decoder Models - Hugging Face
proposed to use an encoder-decoder model purely based on recurrent neural networks (RNNs) for sequence-to-sequence tasks. In contrast to DNNS, ...
Read more >
Encoder Decoder What and Why ? - Simple Explanation
The Encoder-Decoder is a neural network discovered in 2014 and used in many projects. It is a fundamental cornerstone in translation software.
Read more >
Understanding Encoder-Decoder Sequence to Sequence Model
The reason is that using a single vector for encoding the whole input sequence is not capable of capturing the whole information. This...
Read more >
Transformer's Encoder-Decoder: Let's Understand The Model ...
The encoder extracts features from an input sentence, and the decoder uses the features to produce an output sentence (translation). Transformer ...
Read more >
Difference between Encoder and Decoder - GeeksforGeeks
Encoder and Decoder are the combinational logic circuits. ... An AND gate can be used as the basic decoding element because it produces...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found