question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LongT5 Models Are Not Initialized With Pretrained Weights

See original GitHub issue

System Info

transformers version: 4.20.1

  • Platform: Linux-5.4.188±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.13
  • Huggingface_hub version: 0.8.1
  • PyTorch version (GPU?): 1.11.0+cu113 (True)
  • Tensorflow version (GPU?): 2.8.2 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@LysandreJik @stancld @patrickvonplaten

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

I have tried using LongT5 fine-tuning on a long range summarization task on a custom dataset (consider it like CNN/DM in that it is highly extractive). While long-t5-tglobal-base works well (I am able to converge on a validation loss of ~1.25 and ROUGE-2 of ~21), the long-t5-local-base, long-t5-local-large, and long-t5-tglobal-large all end up getting me training/validation losses of 200+ with ROUGE scores of exactly 0, making me believe that these models haven’t actually been initialized with Google’s weights. Here are the json outputs associated with trainer.evaluate() after 1 epoch of training:

google/long-t5-local-base {‘epoch’: 1.0, ‘eval_gen_len’: 1023.0, ‘eval_loss’: 366.21673583984375, ‘eval_rouge1’: 0.0, ‘eval_rouge2’: 0.0, ‘eval_rougeL’: 0.0, ‘eval_rougeLsum’: 0.0, ‘eval_runtime’: 37.9896, ‘eval_samples_per_second’: 0.132, ‘eval_steps_per_second’: 0.053}

google/long-t5-tglobal-base (This one works correctly) {‘epoch’: 1.0, ‘eval_gen_len’: 708.2, ‘eval_loss’: 1.6017440557479858, ‘eval_rouge1’: 35.7791, ‘eval_rouge2’: 11.5732, ‘eval_rougeL’: 19.1541, ‘eval_rougeLsum’: 31.8491, ‘eval_runtime’: 34.8695, ‘eval_samples_per_second’: 0.143, ‘eval_steps_per_second’: 0.057}

google/long-t5-local-large {‘epoch’: 0.77, ‘eval_gen_len’: 1023.0, ‘eval_loss’: 252.44662475585938, ‘eval_rouge1’: 0.0, ‘eval_rouge2’: 0.0, ‘eval_rougeL’: 0.0, ‘eval_rougeLsum’: 0.0, ‘eval_runtime’: 89.2506, ‘eval_samples_per_second’: 0.056, ‘eval_steps_per_second’: 0.034}

google/long-t5-tglobal-large {‘epoch’: 0.77, ‘eval_gen_len’: 1023.0, ‘eval_loss’: 241.6276397705078, ‘eval_rouge1’: 0.0, ‘eval_rouge2’: 0.0, ‘eval_rougeL’: 0.0, ‘eval_rougeLsum’: 0.0, ‘eval_runtime’: 89.9801, ‘eval_samples_per_second’: 0.056, ‘eval_steps_per_second’: 0.033}

For reproduction, just run the standard Huggingface PyTorch training script for summarization on any official dataset (CNN/DM, XSum, etc.).

Note that I haven’t tried the 3B parameter versions so cannot speak to whether this problem affects them as well.

Expected behavior

All four models should have a low validation loss when fine tuning on summarization (as opposed to three of them having 300+ validation losses as if they are randomly initialized).

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
reelmathcommented, Jul 9, 2022

Update 2: Loading from flax works for longt5-tglobal-large and longt5-local-base, but does not work for longt5-local-large (which starts and flatlines at a training and validation loss of around 10).

0reactions
github-actions[bot]commented, Aug 21, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

LongT5 - Hugging Face
Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to...
Read more >
Weights of pre-trained BERT model not initialized
This will issue a warning about some of the pretrained weights not being used and some weights being randomly initialized.
Read more >
arXiv:2209.10052v2 [cs.CL] 16 Nov 2022
L) complexity. 2The projection layers to create these matrices are not used in existing pretrained models and will be randomly initialized.
Read more >
LongT5: Efficient Text-To-Text Transformer for ... - YouTube
t5 #transformers #nlpLongT5 explores the effect of scaling both the input length and model size of T5 at the same time with some...
Read more >
Exploring Google's T5 Text-To-Text Transformer Model - Wandb
Models :ResultsT5 vs LongT5When You Would Use The T5 Model?0. ... pinnacle of transfer learning is when a single standalone model pre-trained ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found