Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transformer-XL: Input and labels for Language Modeling

See original GitHub issue

❓ Questions & Help

Details

I’m trying to finetune the pretrained Transformer-XL model transfo-xl-wt103 for a language modeling task. Therfore, I use the model class TransfoXLLMHeadModel.

To iterate over my dataset I use the LMOrderedIterator from the file tokenization_transfo_xl.py which yields a tensor with the data and its target for each batch (and the sequence length).

My question: Let’s assume the following data with batch_size = 1 and bptt = 8:

data = tensor([[1,2,3,4,5,6,7,8]])
target = tensor([[2,3,4,5,6,7,8,9]])
mems # from the previous output

I currently pass this data into the model like this:

output = model(input_ids=data, labels=target, mems=mems)

Is this correct?

I am wondering because the documentation says for the labels parameter:

labels (:obj:torch.LongTensor of shape :obj:(batch_size, sequence_length), optional, defaults to :obj:None): Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set lm_labels = input_ids

So what is it about the parameter lm_labels? I only see labels defined in the forward method.

And when the labels “are shifted” inside the model, does this mean I have to pass in data twice (for input_ids and labels) because labels shifted inside? But how does the model then know the next token to predict (in the case above: 9) ?

I also read through this bug and the fix in this pull request but I don’t quite understand how to treat the model now (before vs. after fix). Maybe someone could explain me both versions.

Thanks in advance for some help!

A link to original question on Stack Overflow: https://stackoverflow.com/q/62069350/9478384

Issue Analytics

State:
Created 3 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

1reaction

sguggercommented, Jun 2, 2020

The model will then attempt to predict [2, ... , 8] from [1, ... , 7]).

Note that if you are using the state, the memory returned is computed on the whole [1, ... , 8], so you should use [9,10,... , 16] as your next batch.

1reaction

TevenLeScaocommented, Jun 2, 2020

Yes, it was changed in 2.9.0. You should probably consider updating 😉

Top Results From Across the Web

Language modeling - Hugging Face

Language modeling. Language modeling tasks predicts words in a sentence, making these types of models great at generating text. You can use these...

Transformer-XL: Input and labels for Language Modeling

I only see labels defined in the forward method. And when the labels "are shifted" inside the model, does this mean I have...

Language Modeling with nn.Transformer and TorchText

Language Modeling with nn.Transformer and TorchText. This is a tutorial on training a sequence-to-sequence model that uses the nn.Transformer module.

Hugging Face Pre-trained Models: Find the Best One for Your ...

Transformers are language models and have been trained on a large amount of ... Second, we will define a data collator to pad...

Neural machine translation with a Transformer and Keras | Text

Transformers excel at modeling sequential data, such as natural language. ... Keras Model.fit training expects (inputs, labels) pairs. The inputs are pairs ...