LongT5 Models Are Not Initialized With Pretrained Weights
See original GitHub issueSystem Info
transformers
version: 4.20.1
- Platform: Linux-5.4.188±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.11.0+cu113 (True)
- Tensorflow version (GPU?): 2.8.2 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
@LysandreJik @stancld @patrickvonplaten
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
I have tried using LongT5 fine-tuning on a long range summarization task on a custom dataset (consider it like CNN/DM in that it is highly extractive). While long-t5-tglobal-base works well (I am able to converge on a validation loss of ~1.25 and ROUGE-2 of ~21), the long-t5-local-base, long-t5-local-large, and long-t5-tglobal-large all end up getting me training/validation losses of 200+ with ROUGE scores of exactly 0, making me believe that these models haven’t actually been initialized with Google’s weights. Here are the json outputs associated with trainer.evaluate() after 1 epoch of training:
google/long-t5-local-base {‘epoch’: 1.0, ‘eval_gen_len’: 1023.0, ‘eval_loss’: 366.21673583984375, ‘eval_rouge1’: 0.0, ‘eval_rouge2’: 0.0, ‘eval_rougeL’: 0.0, ‘eval_rougeLsum’: 0.0, ‘eval_runtime’: 37.9896, ‘eval_samples_per_second’: 0.132, ‘eval_steps_per_second’: 0.053}
google/long-t5-tglobal-base (This one works correctly) {‘epoch’: 1.0, ‘eval_gen_len’: 708.2, ‘eval_loss’: 1.6017440557479858, ‘eval_rouge1’: 35.7791, ‘eval_rouge2’: 11.5732, ‘eval_rougeL’: 19.1541, ‘eval_rougeLsum’: 31.8491, ‘eval_runtime’: 34.8695, ‘eval_samples_per_second’: 0.143, ‘eval_steps_per_second’: 0.057}
google/long-t5-local-large {‘epoch’: 0.77, ‘eval_gen_len’: 1023.0, ‘eval_loss’: 252.44662475585938, ‘eval_rouge1’: 0.0, ‘eval_rouge2’: 0.0, ‘eval_rougeL’: 0.0, ‘eval_rougeLsum’: 0.0, ‘eval_runtime’: 89.2506, ‘eval_samples_per_second’: 0.056, ‘eval_steps_per_second’: 0.034}
google/long-t5-tglobal-large {‘epoch’: 0.77, ‘eval_gen_len’: 1023.0, ‘eval_loss’: 241.6276397705078, ‘eval_rouge1’: 0.0, ‘eval_rouge2’: 0.0, ‘eval_rougeL’: 0.0, ‘eval_rougeLsum’: 0.0, ‘eval_runtime’: 89.9801, ‘eval_samples_per_second’: 0.056, ‘eval_steps_per_second’: 0.033}
For reproduction, just run the standard Huggingface PyTorch training script for summarization on any official dataset (CNN/DM, XSum, etc.).
Note that I haven’t tried the 3B parameter versions so cannot speak to whether this problem affects them as well.
Expected behavior
All four models should have a low validation loss when fine tuning on summarization (as opposed to three of them having 300+ validation losses as if they are randomly initialized).
Issue Analytics
- State:
- Created a year ago
- Comments:13 (6 by maintainers)
Top GitHub Comments
Update 2: Loading from flax works for longt5-tglobal-large and longt5-local-base, but does not work for longt5-local-large (which starts and flatlines at a training and validation loss of around 10).
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.