Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot replicate T5 performance on WMT14

See original GitHub issue

System Info

I am trying to replicate T5 finetuning on WMT with the following hyperparameters (as close as possible to the paper https://www.jmlr.org/papers/volume21/20-074/20-074.pdf):

–model_name_or_path t5-small –source_lang en –target_lang de –dataset_name stas/wmt14-en-de-pre-processed –max_source_length 512 –max_target_length 512 –val_max_target_length 512 –source_prefix="translate English to German: " –predict_with_generate –save_steps 5000 –eval_steps 5000 –learning_rate 0.001 –max_steps 262144 –optim adafactor –lr_scheduler_type constant –gradient_accumulation_steps 2 --per_device_train_batch_size 64

However, the best model performance I get is around 13 BLEU whereas in the paper reported BLEU is around 27. Any comments on how to fix this ?

Script: https://github.com/huggingface/transformers/blob/main/examples/pytorch/translation/run_translation.py Environment:

transformers version: 4.20.1
Platform: Linux-4.18.0-348.el8.x86_64-x86_64-with-glibc2.28
Python version: 3.10.4
Huggingface_hub version: 0.8.1
PyTorch version (GPU?): 1.12.0 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes - A100
Using distributed or parallel set-up in script?: No

Who can help?

@patrickvonplaten, @sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Use the script with the hyperparameters above : https://github.com/huggingface/transformers/blob/main/examples/pytorch/translation/run_translation.py

Expected behavior

BLEU score should be around 27.

Issue Analytics

State:
Created a year ago
Comments:14 (2 by maintainers)

Top GitHub Comments

1reaction

ydshiehcommented, Sep 21, 2022

@ekurtulus I also think the checkpoints t5-small, t5-base etc. have been trained on WMT / CNN Dailymail datasets, as shown in the code snippet below. So using those checkpoints to replicate the results (by finetuning on those datasets) doesn’t really make sense IMO.

Code snippet

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
tokenizer = AutoTokenizer.from_pretrained("t5-small")

inputs = tokenizer(
    "translate English to German: I am a good student.",
    return_tensors="pt",
)
outputs = model.generate(inputs["input_ids"], max_length=64, num_beams=4, early_stopping=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

inputs = tokenizer(
    "translate English to French: I am a good student.",
    return_tensors="pt",
)
outputs = model.generate(inputs["input_ids"], max_length=64, num_beams=4, early_stopping=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")

inputs = tokenizer(
   """WASHINGTON (CNN) -- Doctors removed five small polyps from President Bush's colon on Saturday, and "none appeared worrisome," a White House spokesman said. The polyps were removed and sent to the National Naval Medical Center in Bethesda, Maryland, for routine microscopic examination, spokesman Scott Stanzel said. Results are expected in two to three days. All were small, less than a centimeter [half an inch] in diameter, he said. Bush is in good humor, Stanzel said, and will resume his activities at Camp David. During the procedure Vice President Dick Cheney assumed presidential power. Bush reclaimed presidential power at 9:21 a.m. after about two hours. Doctors used "monitored anesthesia care," Stanzel said, so the president was asleep, but not as deeply unconscious as with a true general anesthetic. He spoke to first lady Laura Bush -- who is in Midland, Texas, celebrating her mother's birthday -- before and after the procedure, Stanzel said. Afterward, the president played with his Scottish terriers, Barney and Miss Beazley, Stanzel said. He planned to have lunch at Camp David and have briefings with National Security Adviser Stephen Hadley and White House Chief of Staff Josh Bolten, and planned to take a bicycle ride Saturday afternoon. Cheney, meanwhile, spent the morning at his home on Maryland's eastern shore, reading and playing with his dogs, Stanzel said. Nothing occurred that required him to take official action as president before Bush reclaimed presidential power. The procedure was supervised by Dr. Richard Tubb, Bush's physician, and conducted by a multidisciplinary team from the National Naval Medical Center in Bethesda, Maryland, the White House said. Bush's last colonoscopy was in June 2002, and no abnormalities were found, White House spokesman Tony Snow said. The president's doctor had recommended a repeat procedure in about five years. A colonoscopy is the most sensitive test for colon cancer, rectal cancer and polyps, small clumps of cells that can become cancerous, according to the Mayo Clinic. Small polyps may be removed during the procedure. Snow said on Friday that Bush had polyps removed during colonoscopies before becoming president. Snow himself is undergoing chemotherapy for cancer that began in his colon and spread to his liver. Watch Snow talk about Bush's procedure and his own colon cancer » . "The president wants to encourage everybody to use surveillance," Snow said. The American Cancer Society recommends that people without high risk factors or symptoms begin getting screened for signs of colorectal cancer at age 50. E-mail to a friend ."""
    return_tensors="pt",
)
outputs = model.generate(inputs["input_ids"], max_length=64, num_beams=4, early_stopping=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Outputs

Ich bin ein guter Student.
Je suis un bon étudiant.

five small polyps were removed from president Bush's colon on Saturday. none of the polyps appeared worrisome, a white house spokesman said. During the procedure, vice president Dick Cheney assumed presidential power.

1reaction

ydshiehcommented, Sep 19, 2022

Sorry for being late. I will take a look.

Top Results From Across the Web

T5 - Hugging Face

The T5 model was presented in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel, Noam Shazeer, Adam...

Understanding and Improving Encoder Layer Fusion in ...

Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) ...

T5: a detailed explanation - Medium

Results show that pre-training on domain data helps downstream task performance. For example, pretrained model on RealNews-like dataset ...

LightSeq2 - ACM Digital Library

Transformers, but they cannot support Transformer training. DeepSpeed provides optimized training for Transformer [12],.

AlexaTM 20B: Few-Shot Learning Using a Large-Scale ... - arXiv

and SQuADv2 datasets and provides SOTA performance on ... even a 25 times larger decoder-only model cannot compensate for a lack.