question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why the outputs from decoder layers are concatenated while this is not the case for the encoder

See original GitHub issue

Hi,

I have a question regarding the code for the transformer encoder & decoder. I am looking into the case args.deformable=False and args.Tracking=False. Looking into transformer.py I am a bit confused about the code. I check the self.num_layers and they are 6 for both encoder and decoder. However, looking at the code, In the decoder forward, output from each layer is stored, and then they are stacked for the final output.

for i, layer in enumerate(self.layers):
            if self.track_attention:
                track_output = output[:-100].clone()

                track_output = self.layers_track_attention[i](
                    track_output,
                    src_mask=tgt_mask,
                    src_key_padding_mask=tgt_key_padding_mask,
                    pos=track_query_pos)

                output = torch.cat([track_output, output[-100:]])

            output = layer(output, memory, tgt_mask=tgt_mask,
                           memory_mask=memory_mask,
                           tgt_key_padding_mask=tgt_key_padding_mask,
                           memory_key_padding_mask=memory_key_padding_mask,
                           pos=pos, query_pos=query_pos)
            
            if self.return_intermediate:
                intermediate.append(output)

if self.return_intermediate:
       output = torch.stack(intermediate)

if self.norm is not None:
    return self.norm(output), output

for the encoder however the loop is done but the output is the result of the last layer only.

       for layer in self.layers:
            output = layer(output, src_mask=mask,
                           src_key_padding_mask=src_key_padding_mask, pos=pos)
            print(output.shape,'enc-layer')

        if self.norm is not None:
            output = self.norm(output)

Is this a bug or am I misunderstood?

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
timmeinhardtcommented, May 12, 2022

Please read the paper and its related work to fully understand how our method is working.

0reactions
mkhoshlecommented, May 12, 2022

Does that mean the transformer has losses besides the ones that are introduced in the paper?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Concatenate layer shape error in sequence2sequence model ...
When you specify axis as -1 (default value), your concatenation layer basically flattens the input before use, which in your case does not ......
Read more >
Transformer-based Encoder-Decoder Models - Hugging Face
The transformer-based encoder-decoder model was introduced by Vaswani ... output space, which the self-attention layer does not manage to do ...
Read more >
Transformers Explained Visually (Part 3): Multi-head Attention ...
The Attention layer takes its input in the form of three parameters, known as the Query, Key, and Value. All three parameters are...
Read more >
How Does Attention Work in Encoder-Decoder Recurrent ...
In this case, a bidirectional input is used where the input sequences are provided both forward and backward, which are then concatenated ...
Read more >
Understanding and Improving Encoder Layer Fusion in ...
The experiments show that the encoder embedding layer is beneficial for all decoder layers, why the proposed SurfaceFusion does not consider connecting the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found