question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproducibility issue on XSum dataset

See original GitHub issue

Hi,

I followed the instructions for fine-tuning UniLM v1.2 on XSum task. Specifically I used the following commands:

python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py  \
--train_file ${TRAIN_FILE} --output_dir ${OUTPUT_DIR}   --model_type unilm \
--model_name_or_path unilm1.2-base-uncased   --do_lower_case \
--fp16 --fp16_opt_level O2 --max_source_seq_length 464 --max_target_seq_length 48   \
--per_gpu_train_batch_size 16 --gradient_accumulation_steps 1   --learning_rate 7e-5 \
--num_warmup_steps 500 --num_training_steps 32000 --cache_dir ${CACHE_DIR}

The training loss at the end of fine-tuning goes to ~1.9

Next I decode on the test set using:

python decode_seq2seq.py   --fp16 --model_type unilm --tokenizer_name unilm1.2-base-uncased \
--input_file ${INPUT_JSON} --split $SPLIT --do_lower_case   --model_path ${MODEL_PATH} \
--max_seq_length 512 --max_tgt_length 48 --batch_size 16 --beam_size 5   --length_penalty 0 \
--forbid_duplicate_ngrams --mode s2s --forbid_ignore_word "."

And evaluate using:

python evaluations/eval_for_xsum.py --pred ${MODEL_PATH}.${SPLIT} \
--gold ${GOLD_PATH} --split ${SPLIT} --perl

However, this gives me following ROUGE scores:

ROUGE-F(1/2/l): 9.75/4.47/7.22
ROUGE-R(1/2/3/l): 5.34/2.44/3.95

I also tried fine-tuning miniLM based on the instructions provided. The results are similar to above.

@donglixp @wolfshow Could you please guide me on what I must be doing wrong?

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
HareeshBahuleyancommented, Apr 29, 2020

@donglixp Oops, that was indeed the issue. I was wrongly using xsum.test.json as $GOLD_PATH

It works perfectly now with the following ROUGE scores:

ROUGE-1: 0.43251214807560934    ROUGE-2: 0.20548924503436466

Thanks for the help. You may close this issue 👍

0reactions
donglixpcommented, Apr 29, 2020

I checked your decoding results. They look well aligned with the reference. In the evaluation command, is the env var ${GOLD_PATH} pointed to the file (test.target) of https://unilm.blob.core.windows.net/s2s-ft-data/xsum.eval.zip ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

arXiv:2101.06561v4 [cs.CL] 1 Nov 2022
Toward Reproducible and Standardized Human Evaluation for Text Generation ... chose XSUM over alternative datasets for text sum-.
Read more >
XSum Dataset - Papers With Code
The Extreme Summarization (XSum) dataset is a dataset for evaluation of abstractive single-document summarization systems. The goal is to create a short, ...
Read more >
GENeratIve Evaluation (GENIE) - Summarization XSUM
This leaderboard collects evaluations of current AI systems on various commonsense/reasoning tasks that measure both the knowledge that these systems possess as ...
Read more >
Towards Improving Faithfulness in Abstractive Summarization
Extensive experiments on two benchmark summarization datasets, CNN/DM and XSum, ... XSum has more faithfulness issues in general.
Read more >
A Close Examination of Factual Correctness Evaluation in ...
Generating fabricated facts has been a long-standing problem of abstractive sum- ... XSum dataset, respectively, and measure factual alignments with 5-scale ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found