Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why the outputs from decoder layers are concatenated while this is not the case for the encoder

See original GitHub issue

Hi,

I have a question regarding the code for the transformer encoder & decoder. I am looking into the case args.deformable=False and args.Tracking=False. Looking into transformer.py I am a bit confused about the code. I check the self.num_layers and they are 6 for both encoder and decoder. However, looking at the code, In the decoder forward, output from each layer is stored, and then they are stacked for the final output.

for i, layer in enumerate(self.layers):
            if self.track_attention:
                track_output = output[:-100].clone()

                track_output = self.layers_track_attention[i](
                    track_output,
                    src_mask=tgt_mask,
                    src_key_padding_mask=tgt_key_padding_mask,
                    pos=track_query_pos)

                output = torch.cat([track_output, output[-100:]])

            output = layer(output, memory, tgt_mask=tgt_mask,
                           memory_mask=memory_mask,
                           tgt_key_padding_mask=tgt_key_padding_mask,
                           memory_key_padding_mask=memory_key_padding_mask,
                           pos=pos, query_pos=query_pos)
            
            if self.return_intermediate:
                intermediate.append(output)

if self.return_intermediate:
       output = torch.stack(intermediate)

if self.norm is not None:
    return self.norm(output), output

for the encoder however the loop is done but the output is the result of the last layer only.

       for layer in self.layers:
            output = layer(output, src_mask=mask,
                           src_key_padding_mask=src_key_padding_mask, pos=pos)
            print(output.shape,'enc-layer')

        if self.norm is not None:
            output = self.norm(output)

Is this a bug or am I misunderstood?

Issue Analytics

State:
Created a year ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

timmeinhardtcommented, May 12, 2022

Please read the paper and its related work to fully understand how our method is working.

0reactions

mkhoshlecommented, May 12, 2022

Does that mean the transformer has losses besides the ones that are introduced in the paper?

Top Results From Across the Web

Concatenate layer shape error in sequence2sequence model ...

When you specify axis as -1 (default value), your concatenation layer basically flattens the input before use, which in your case does not ......

Transformer-based Encoder-Decoder Models - Hugging Face

The transformer-based encoder-decoder model was introduced by Vaswani ... output space, which the self-attention layer does not manage to do ...

Transformers Explained Visually (Part 3): Multi-head Attention ...

The Attention layer takes its input in the form of three parameters, known as the Query, Key, and Value. All three parameters are...

How Does Attention Work in Encoder-Decoder Recurrent ...

In this case, a bidirectional input is used where the input sequences are provided both forward and backward, which are then concatenated ...

Understanding and Improving Encoder Layer Fusion in ...

The experiments show that the encoder embedding layer is beneficial for all decoder layers, why the proposed SurfaceFusion does not consider connecting the ......