Questions on generating using encoder-decoder models
See original GitHub issueHi, I want to conduct a Grammatical Error Correction task with BART, which takes corrupted sentences as inputs and make corrected answers as outputs.
The model I’m using is BartForConditionalGeneration
.
I want to ask several things.
-
What is the difference between
decoder_input_ids
andlabels
? The doc says, when handling seq2seq problems such as translation or summarization,decoder_input_ids
should be given, otherwise the model just put the shifted encoder input into the decoder, which is not the desired process. However, there is another argumentlabels
and I think I should give the answer sequence aslabels
to get the loss. And according to here, I assume that BART takes the answer outputs aslabels
. Then what isdecoder_input_ids
? Is this not necessary when usingmodel.forward
function to train the model? -
Should I pad the decoder inputs with
-100
? According to the doc, to make the loss function ignore the unwanted locations, it should be set to-100
. But I want to make it ignore the pad token. Should I just set the pad token as-100
or is there any way to make the loss function ignore the value I set? -
Unlike the training, inference does not require the answers. However, like I mentioned above, if the model is not given
decoder_input_ids
orlabels
, then the model put the shifted inputs into the decoder. But this is not what we want. The decoder should start only with the start token at first. Then is it right to usemodel.generate
notmodel.forward
function without any decoder inputs given? I think I should usemodel.generate
when inferencing but I want to make sure thatmodel.generate(input_ids=input_ids)
works as I described, which gives only the start token in the beginning. In fact, like the image below, it seems the input ids might be just copied judging by the values. So I’m worried if the decoder just took the input ids. -
According to this, BART was pretrained to use EOS token as the start token of the decoder. I don’t know why it should be, but anyway like the above image, we can see that all outputs start with both EOS and BOS token. Then may I assume that the model put both EOS and BOS token as the starting sign?
-
The last question is about beam search. I want to get the last hidden state from the decoder to conduct multi-task learning combined with LM and sentence classification. But when using the beam search, the shape of one tensor from
decoder_hidden_states
becomes(batch_size*num_beams*num_return_sequences, generated_length, hidden_size)
. Then how can we know which one is from the best result?
Thank you for reading this long questions.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
@NielsRogge Yes that’s what I used at the start. The problem lies in the fact that I want to convert my model to onnx, where the
generate
function is not available. I guess I will have to write my own greedy decoding method.We’ve actually just added an example of converting BART to ONNX, including beam search generation. However, the example doesn’t include a README right now, it will be added soon.