question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transformer-XL: Input and labels for Language Modeling

See original GitHub issue

❓ Questions & Help

Details

I’m trying to finetune the pretrained Transformer-XL model transfo-xl-wt103 for a language modeling task. Therfore, I use the model class TransfoXLLMHeadModel.

To iterate over my dataset I use the LMOrderedIterator from the file tokenization_transfo_xl.py which yields a tensor with the data and its target for each batch (and the sequence length).

My question: Let’s assume the following data with batch_size = 1 and bptt = 8:

data = tensor([[1,2,3,4,5,6,7,8]])
target = tensor([[2,3,4,5,6,7,8,9]])
mems # from the previous output

I currently pass this data into the model like this:

output = model(input_ids=data, labels=target, mems=mems)

Is this correct?

I am wondering because the documentation says for the labels parameter:

labels (:obj:torch.LongTensor of shape :obj:(batch_size, sequence_length), optional, defaults to :obj:None): Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set lm_labels = input_ids

So what is it about the parameter lm_labels? I only see labels defined in the forward method.

And when the labels “are shifted” inside the model, does this mean I have to pass in data twice (for input_ids and labels) because labels shifted inside? But how does the model then know the next token to predict (in the case above: 9) ?

I also read through this bug and the fix in this pull request but I don’t quite understand how to treat the model now (before vs. after fix). Maybe someone could explain me both versions.

Thanks in advance for some help!

A link to original question on Stack Overflow: https://stackoverflow.com/q/62069350/9478384

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
sguggercommented, Jun 2, 2020

The model will then attempt to predict [2, ... , 8] from [1, ... , 7]).

Note that if you are using the state, the memory returned is computed on the whole [1, ... , 8], so you should use [9,10,... , 16] as your next batch.

1reaction
TevenLeScaocommented, Jun 2, 2020

Yes, it was changed in 2.9.0. You should probably consider updating 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

Language modeling - Hugging Face
Language modeling. Language modeling tasks predicts words in a sentence, making these types of models great at generating text. You can use these...
Read more >
Transformer-XL: Input and labels for Language Modeling
I only see labels defined in the forward method. And when the labels "are shifted" inside the model, does this mean I have...
Read more >
Language Modeling with nn.Transformer and TorchText
Language Modeling with nn.Transformer and TorchText. This is a tutorial on training a sequence-to-sequence model that uses the nn.Transformer module.
Read more >
Hugging Face Pre-trained Models: Find the Best One for Your ...
Transformers are language models and have been trained on a large amount of ... Second, we will define a data collator to pad...
Read more >
Neural machine translation with a Transformer and Keras | Text
Transformers excel at modeling sequential data, such as natural language. ... Keras Model.fit training expects (inputs, labels) pairs. The inputs are pairs ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found