question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training Transformer XL from scratch

See original GitHub issue

Hello, I am trying to recreate this notebook https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb for transformer XL I made changes to the tokenizer as follows

%%time 
from pathlib import Path
from tokenizers import Tokenizer
from tokenizers.models import WordLevel
from tokenizers import normalizers
from tokenizers.normalizers import Lowercase, NFD, StripAccents
from tokenizers.pre_tokenizers import Whitespace
from tokenizers.processors import TemplateProcessing
from tokenizers.trainers import WordPieceTrainer
from tokenizers.trainers import WordLevelTrainer


tokenizer = Tokenizer(WordLevel(unk_token="[UNK]"))
tokenizer.normalizer = normalizers.Sequence([NFD(), Lowercase(), StripAccents()])
tokenizer.pre_tokenizer = Whitespace()


bert_tokenizer.post_processor = TemplateProcessing(
    single="[CLS] $A [SEP]",
    pair="[CLS] $A [SEP] $B:1 [SEP]:1",
    special_tokens=[
        ("[CLS]", 1),
        ("[SEP]", 2),
    ],
)

trainer = WordLevelTrainer(show_progress=True, special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])

files = [str(x) for x in Path(".").glob("**/*.txt")]

tokenizer.train(files, trainer)

tokenizer.save("espertransXL.json")

and then loaded it into the FastTokenizer

from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast(tokenizer_file="espertransXL.json")
tokenizer.bos_token="[CLS]"
tokenizer.eos_token="[SEP]"
tokenizer.sep_token="[SEP]"
tokenizer.cls_token="[CLS]"
tokenizer.unk_token="[UNK]"
tokenizer.pad_token="[PAD]"
tokenizer.mask_token="[MASK]"        
        
tokenizer._bos_token="[CLS]"
tokenizer._eos_token="[SEP]"
tokenizer._sep_token="[SEP]"
tokenizer._cls_token="[CLS]"
tokenizer._unk_token="[UNK]"
tokenizer._pad_token="[PAD]"
tokenizer._mask_token="[MASK]"  

Post that, I instantiated the model

from transformers import TransfoXLConfig, TransfoXLModel

config = TransfoXLConfig()
model = TransfoXLModel(config=config)

Set up the data collator:

from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

Setting up the trainer as follows

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./TransfoXL",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_gpu_train_batch_size=16,
    save_steps=10_000,
    save_total_limit=2,
    prediction_loss_only=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)

When I execute:

%%time
trainer.train()

I get the following error:

TypeError                                 Traceback (most recent call last)
<timed eval> in <module>

/opt/conda/envs/Python-3.7-CUDA/lib/python3.7/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
   1270                         tr_loss += self.training_step(model, inputs)
   1271                 else:
-> 1272                     tr_loss += self.training_step(model, inputs)
   1273                 self.current_flos += float(self.floating_point_ops(inputs))
   1274 

/opt/conda/envs/Python-3.7-CUDA/lib/python3.7/site-packages/transformers/trainer.py in training_step(self, model, inputs)
   1732                 loss = self.compute_loss(model, inputs)
   1733         else:
-> 1734             loss = self.compute_loss(model, inputs)
   1735 
   1736         if self.args.n_gpu > 1:

/opt/conda/envs/Python-3.7-CUDA/lib/python3.7/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   1764         else:
   1765             labels = None
-> 1766         outputs = model(**inputs)
   1767         # Save past state if it exists
   1768         # TODO: this needs to be fixed and made cleaner later.

/opt/conda/envs/Python-3.7-CUDA/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

TypeError: forward() got an unexpected keyword argument 'attention_mask'

Can some please advise on this or if they have a working notebook example point to it?

Thanks

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
gugarosacommented, Aug 13, 2021

Hello @vishrawas!

You could subclass TransfoXLLMHeadModel and change its output dictionary from losses to loss, so it would work with the trainer. Please note that you will probably have to reduce the loss prior to the return, as it has not been reduced yet, for example: loss.mean():

class OwnTransfoXLLMHeadModel(TransfoXLLMHeadModel):
    def __init__(self, *args, **kwargs) -> None:
        super(OwnTransfoXLLMHeadModel, self).__init__(*args, **kwargs)

    def forward(
        self,
        input_ids=None,
        mems=None,
        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=None,
    ):
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        if input_ids is not None:
            bsz, tgt_len = input_ids.size(0), input_ids.size(1)
        elif inputs_embeds is not None:
            bsz, tgt_len = inputs_embeds.size(0), inputs_embeds.size(1)
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        transformer_outputs = self.transformer(
            input_ids,
            mems=mems,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        last_hidden = transformer_outputs[0]
        pred_hid = last_hidden[:, -tgt_len:]

        softmax_output = self.crit(pred_hid, labels)
        prediction_scores = softmax_output.view(bsz, tgt_len, -1) if labels is None else ()
        loss = softmax_output.view(bsz, tgt_len - 1) if labels is not None else None
        loss = loss.mean()

        if not return_dict:
            output = (prediction_scores,) + transformer_outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return TransfoXLLMHeadModelOutput(
            loss=loss,
            prediction_scores=prediction_scores,
            mems=transformer_outputs.mems,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

Additionally, you will need to subclass ModelOutput in the same way TransfoXLLMHeadModelOutput does and change the losses argument to loss:

class TransfoXLLMHeadModelOutput(ModelOutput):
    loss: Optional[torch.FloatTensor] = None
    prediction_scores: torch.FloatTensor = None
    mems: List[torch.FloatTensor] = None
    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
    attentions: Optional[Tuple[torch.FloatTensor]] = None

    @property
    def logits(self):
        return self.prediction_scores
0reactions
siwei-licommented, Nov 24, 2021

@LysandreJik Thank you for the reply. I made those changes and while that error is resolved, I am getting the error KeyError: 'loss' On searching the internet, it seems that this error comes when labels are not defined, but I believe I have defined it. I have created this public notebook for transformerXL https://colab.research.google.com/drive/1vMVoPhtkHFC_-0X-hgwHvH03ynGT0j5i?usp=sharing . Can you please check and advise.

I would be happy to publish this as a tutorial/example once it is working as I see this question on training transformer-xl has come up in past.

Hello there! I wonder if you have an updated version of the transformer-XL notebook? Thank you for your help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training Transformer XL from scratch - Hugging Face Forums
Hello, I am trying to recreate this notebook ...
Read more >
PyTorch-Transformers
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
Read more >
Transformer-XL Review: Beyond Fixed-Length Contexts
Transformer -XL, a new architecture that enables natural language understanding beyond a fixed-length context without disrupting temporal ...
Read more >
Transformer-XL - Medium
Transformer -XL (extra-long) combines the pros of both of the models. Transformer-XL works like vanilla Transformer but caches the previous ...
Read more >
Transformer-XL Explained | Papers With Code
Instead of computing the hidden states from scratch for each new segment, Transformer-XL reuses the hidden states obtained in previous segments. The reused ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found