Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different result in AutoModelForCausalLM

See original GitHub issue

🚀 Feature request

Models inside AutoModelForCausalLM have different behavior on loss calculation.

In BartForCausalLM there is no shift in loss calculation https://github.com/huggingface/transformers/blob/b013842244df7be96b8cc841491bd1e35e475e36/src/transformers/models/bart/modeling_bart.py#L1745

loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.config.vocab_size), labels.view(-1))

In RobertaForCausalLM A shift is applied before loss calculation https://github.com/huggingface/transformers/blob/b013842244df7be96b8cc841491bd1e35e475e36/src/transformers/models/roberta/modeling_roberta.py#L944

# we are doing next-token prediction; shift prediction scores and input ids by one
shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous()
labels = labels[:, 1:].contiguous()
loss_fct = CrossEntropyLoss()
lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

Motivation

I found a mistake when I switched the config from Roberta to BART in AutoModelForCausalLM. It turns out to be different labeling in loss. So, It would be nice to make CausalLM models handle label in the same way, either shift or not.

Your contribution

I can make a PR to make sure that all the models will have a shift prediction.

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

patil-surajcommented, Mar 2, 2021

BartForCausalLM does accept labels==input_id, in general, all the decoders in EncoderDecoder accept that and that’s what we have documented, pass the same input as labels and decoder_input_ids.

The reason I suggested using shift_tokens_right, because BART uses eos as decoder_start_token which the shift_tokens_right function handles. This is different from RobertaForCausalm, GPT2LMHeadModel ...

1reaction

patrickvonplatencommented, Mar 2, 2021

Hmm, I’m not 100% whether everybody is on the same page here. BartForCausalLM was mostly created to be used in combination with EncoderDecoderModel and not as a standalone model. Also, Roberta requires both input_ids and labels as an input to correctly calculate the loss - the difference is just that that input_ids should be equal to labels with the labels being shifted under-the-hood. This is not the same thing as the shift_tokens_right function, which fully generates the decoder_input_ids from the labels…

I think I would be fine with changing the behavior of BartForCausalLM so that labels==input_ids can be input to the function, even if this would be a slight breaking change. It would align BartForCausalLM closer with RobertaForCausalm, GPT2LMHeadModel, ... which would then also allow EncoderDecoderModel to have a general shift_tokens function.

Does this make sense?

Top Results From Across the Web

Models — transformers 4.10.1 documentation - Hugging Face

The other methods that are common to each model are defined in ModuleUtilsMixin (for the PyTorch models) and TFModuleUtilsMixin (for the TensorFlow models) ......

Can't use AutoModelForCausalLM with bert #5474 - GitHub

Bug Information Model I am using (Bert, XLNet ...): bert-base-uncased Language I am using the model on (English, Chinese .

huggingface transformer basic usage - Kaggle

It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement) ...

Load a pre-trained model from disk with Huggingface ...

Where is the file located relative to your model folder? I believe it has to be a relative PATH rather than an absolute...

Pretrain Transformers Models in PyTorch Using Hugging Face ...

When there is a need to run a different transformer model ... When I learn from a tutorial I always try to replicate...