question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Score2Perf: A Question Regarding Mean-Aggregation and Decoder Inputs

See original GitHub issue

Hi all,

Thank you to the Magenta team for building this - itā€™s truly amazing to see what a community like this can accomplish. Makes 2020 that much better šŸ‘ Full disclosure - Iā€™m coming at this from PyTorch and have the majority of my experience working with Huggingfaceā€™s models. So forgive me if I mis-parse or donā€™t understand Tensorflow conventions yet. Go easy on me!

Context: In the accompanying paper to the score2perf model, Encoding Musical Style with Transformer Autoencoders, you discuss ā€˜temporal compressionā€™ (page 2 diagram, below) - I am uncertain of how the compressed representation is attended to in the decoder: tfae

The point in the code where it looks like this is implemented is here (line 110ish, transformer_autoencoder.py):

  if not baseline:
    encoder_output = tf.math.reduce_mean(
        encoder_output, axis=1, keep_dims=True)
    encoder_decoder_attention_bias = tf.math.reduce_mean(
        encoder_decoder_attention_bias, axis=-1, keep_dims=True)

My Question(s) Firstly - doesnā€™t reduce_mean include in its calculation outputs corresponding to <pad> tokens? E.g. a sequence input to the model with seq_length 123 with a model block size/max input size of 512 would have 512-123 = 389 pad tokens at its end, which (I believe) means that the final 389 vectors in the output space (of shape [1, 512, d_model]) would be meaningless (since the attention mask at input would be zero for all those positions). Shouldnā€™t we aggregate only non-pad-related outputs?

Second - how is the resulting output vector attended to in the decode step? Iā€™m used to encoder outputs being size [batch, block_size, d_model], not [batch, 1, d_model] (which I believe would be the case here). Are all the cross attentions attending to a single vector? Is that okay?

Third - what do the decoder teacher-forcing inputs look like at train time? Say weā€™re encoding input tokens [[4,2,7]]. Are our decoder inputs [[0, 4, 2]] (standard right_shifting per T5/etc causal language modeling), or are they something else? Not sure how the aggregated encoding changes things.

Any help would be greatly appreciated!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
apteryxlabscommented, Jan 25, 2021

@eeelnico Youā€™ll want to read this.

0reactions
apteryxlabscommented, Jan 25, 2021

Thanks @kristychoi! We can pretty readily expand our dataset; Iā€™ll try that plus the masking (if present). For the perturbations, masks and substitutions weā€™ll definitely try. If we try anything else that works, Iā€™ll be sure to post it here.

Thank you for your help! Itā€™s good to know weā€™re more or less on the right track.

Read more comments on GitHub >

github_iconTop Results From Across the Web

magenta/transformer_autoencoder.py at main - GitHub
The Transformer autoencoder consists of an encoder and a decoder. The models. currently support conditioning on both performance and melody -- some things....
Read more >
Question answering - Hugging Face Course
This involves posing questions about a document and identifying the answers as spans of text in the document itself.
Read more >
what is the first input to the decoder in a transformer model?
At each decoding time step, the decoder receives 2 inputs: the encoder output: this is computed once and is fed to all layers...
Read more >
Transformer ā€” PyTorch 1.13 documentation
The architecture is based on the paper ā€œAttention Is All You Needā€. ... (int) ā€“ the number of expected features in the encoder/decoder...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found