Transformer-XL: Input and labels for Language Modeling
See original GitHub issue❓ Questions & Help
Details
I’m trying to finetune the pretrained Transformer-XL model transfo-xl-wt103
for a language modeling task. Therfore, I use the model class TransfoXLLMHeadModel
.
To iterate over my dataset I use the LMOrderedIterator
from the file tokenization_transfo_xl.py which yields a tensor with the data
and its target
for each batch (and the sequence length).
My question:
Let’s assume the following data with batch_size = 1
and bptt = 8
:
data = tensor([[1,2,3,4,5,6,7,8]])
target = tensor([[2,3,4,5,6,7,8,9]])
mems # from the previous output
I currently pass this data into the model like this:
output = model(input_ids=data, labels=target, mems=mems)
Is this correct?
I am wondering because the documentation says for the labels
parameter:
labels (:obj:
torch.LongTensor
of shape :obj:(batch_size, sequence_length)
,optional
, defaults to :obj:None
): Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can setlm_labels = input_ids
So what is it about the parameter lm_labels
? I only see labels
defined in the forward
method.
And when the labels “are shifted” inside the model, does this mean I have to pass in data
twice (for input_ids
and labels
) because labels
shifted inside? But how does the model then know the next token to predict (in the case above: 9
) ?
I also read through this bug and the fix in this pull request but I don’t quite understand how to treat the model now (before vs. after fix). Maybe someone could explain me both versions.
Thanks in advance for some help!
A link to original question on Stack Overflow: https://stackoverflow.com/q/62069350/9478384
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (11 by maintainers)
Top GitHub Comments
Note that if you are using the state, the memory returned is computed on the whole
[1, ... , 8]
, so you should use[9,10,... , 16]
as your next batch.Yes, it was changed in 2.9.0. You should probably consider updating 😉