question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different result in AutoModelForCausalLM

See original GitHub issue

🚀 Feature request

Models inside AutoModelForCausalLM have different behavior on loss calculation.

In BartForCausalLM there is no shift in loss calculation https://github.com/huggingface/transformers/blob/b013842244df7be96b8cc841491bd1e35e475e36/src/transformers/models/bart/modeling_bart.py#L1745

loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.config.vocab_size), labels.view(-1))

In RobertaForCausalLM A shift is applied before loss calculation https://github.com/huggingface/transformers/blob/b013842244df7be96b8cc841491bd1e35e475e36/src/transformers/models/roberta/modeling_roberta.py#L944

# we are doing next-token prediction; shift prediction scores and input ids by one
shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous()
labels = labels[:, 1:].contiguous()
loss_fct = CrossEntropyLoss()
lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

Motivation

I found a mistake when I switched the config from Roberta to BART in AutoModelForCausalLM. It turns out to be different labeling in loss. So, It would be nice to make CausalLM models handle label in the same way, either shift or not.

Your contribution

I can make a PR to make sure that all the models will have a shift prediction.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
patil-surajcommented, Mar 2, 2021

BartForCausalLM does accept labels==input_id, in general, all the decoders in EncoderDecoder accept that and that’s what we have documented, pass the same input as labels and decoder_input_ids.

The reason I suggested using shift_tokens_right, because BART uses eos as decoder_start_token which the shift_tokens_right function handles. This is different from RobertaForCausalm, GPT2LMHeadModel ...

1reaction
patrickvonplatencommented, Mar 2, 2021

Hmm, I’m not 100% whether everybody is on the same page here. BartForCausalLM was mostly created to be used in combination with EncoderDecoderModel and not as a standalone model. Also, Roberta requires both input_ids and labels as an input to correctly calculate the loss - the difference is just that that input_ids should be equal to labels with the labels being shifted under-the-hood. This is not the same thing as the shift_tokens_right function, which fully generates the decoder_input_ids from the labels…

I think I would be fine with changing the behavior of BartForCausalLM so that labels==input_ids can be input to the function, even if this would be a slight breaking change. It would align BartForCausalLM closer with RobertaForCausalm, GPT2LMHeadModel, ... which would then also allow EncoderDecoderModel to have a general shift_tokens function.

Does this make sense?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Models — transformers 4.10.1 documentation - Hugging Face
The other methods that are common to each model are defined in ModuleUtilsMixin (for the PyTorch models) and TFModuleUtilsMixin (for the TensorFlow models) ......
Read more >
Can't use AutoModelForCausalLM with bert #5474 - GitHub
Bug Information Model I am using (Bert, XLNet ...): bert-base-uncased Language I am using the model on (English, Chinese .
Read more >
huggingface transformer basic usage - Kaggle
It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement) ...
Read more >
Load a pre-trained model from disk with Huggingface ...
Where is the file located relative to your model folder? I believe it has to be a relative PATH rather than an absolute...
Read more >
Pretrain Transformers Models in PyTorch Using Hugging Face ...
When there is a need to run a different transformer model ... When I learn from a tutorial I always try to replicate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found