question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Questions on generating using encoder-decoder models

See original GitHub issue

Hi, I want to conduct a Grammatical Error Correction task with BART, which takes corrupted sentences as inputs and make corrected answers as outputs. The model I’m using is BartForConditionalGeneration.

I want to ask several things.

  1. What is the difference between decoder_input_ids and labels? The doc says, when handling seq2seq problems such as translation or summarization, decoder_input_ids should be given, otherwise the model just put the shifted encoder input into the decoder, which is not the desired process. However, there is another argument labels and I think I should give the answer sequence as labels to get the loss. And according to here, I assume that BART takes the answer outputs as labels. Then what is decoder_input_ids? Is this not necessary when using model.forward function to train the model?

  2. Should I pad the decoder inputs with -100? According to the doc, to make the loss function ignore the unwanted locations, it should be set to -100. But I want to make it ignore the pad token. Should I just set the pad token as -100 or is there any way to make the loss function ignore the value I set?

  3. Unlike the training, inference does not require the answers. However, like I mentioned above, if the model is not given decoder_input_ids or labels, then the model put the shifted inputs into the decoder. But this is not what we want. The decoder should start only with the start token at first. Then is it right to use model.generate not model.forward function without any decoder inputs given? I think I should use model.generate when inferencing but I want to make sure that model.generate(input_ids=input_ids) works as I described, which gives only the start token in the beginning. In fact, like the image below, it seems the input ids might be just copied judging by the values. So I’m worried if the decoder just took the input ids. image

  4. According to this, BART was pretrained to use EOS token as the start token of the decoder. I don’t know why it should be, but anyway like the above image, we can see that all outputs start with both EOS and BOS token. Then may I assume that the model put both EOS and BOS token as the starting sign?

  5. The last question is about beam search. I want to get the last hidden state from the decoder to conduct multi-task learning combined with LM and sentence classification. But when using the beam search, the shape of one tensor from decoder_hidden_states becomes (batch_size*num_beams*num_return_sequences, generated_length, hidden_size). Then how can we know which one is from the best result?

Thank you for reading this long questions.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ZiyueWangUoBcommented, Nov 1, 2021

@NielsRogge Yes that’s what I used at the start. The problem lies in the fact that I want to convert my model to onnx, where the generate function is not available. I guess I will have to write my own greedy decoding method.

0reactions
NielsRoggecommented, Nov 1, 2021

We’ve actually just added an example of converting BART to ONNX, including beam search generation. However, the example doesn’t include a README right now, it will be added soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Encoder-Decoder Architectures for Generating Questions
Here in this study various encoder decoder architectures for generating questions from text inputs have been explored using Stanford's SQuAD dataset as for ......
Read more >
Transformer-based Encoder-Decoder Models - Hugging Face
The goal of the blog post is to give an in-detail explanation of how the transformer-based encoder-decoder architecture models sequence-to- ...
Read more >
Train EncoderDecoder Models for question generation #5213
Hi Everyone,. I am trying to finetune an Encoder Decoder model on question generation task on SQuAD. Input data are a concatenation of...
Read more >
How to Develop an Encoder-Decoder Model for Sequence-to ...
In this tutorial, you will discover how to develop a sophisticated encoder-decoder recurrent neural network for sequence-to-sequence prediction ...
Read more >
Question Generation System using Seq2Seq - Tushar Jain
We use an attention layer between answer encoder and decoder. We make no as- sumptions about the nature of the input and train...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found