question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

google/pegasus-cnn_dailymail generates blank file

See original GitHub issue

Environment info

  • transformers version: 4.2.0 and 4.5.1
  • Platform: linux
  • Python version: 3.6
  • PyTorch version (GPU?): 1.7.1
  • Tensorflow version (GPU?): NA
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Yes (and I also try to not use distributed but problem exists)

Who can help

@patrickvonplaten, @patil-suraj

Information

Model I am using (Bert, XLNet …): google/pegasus-cnn_dailymail

The problem arises when using:

The tasks I am working on is:

  • an official GLUE/SQUaD task: summarization with ROUGE
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

I am trying to generate the summaries from Pegasus on CNN/DM and XSUM datasets. I use the same dataset shared by HuggingFace (from README.md in https://github.com/huggingface/transformers/tree/master/examples/legacy/seq2seq). My experiments are run on 3 V100 GPUs. I use google/pegasus-cnn_dailymail for CNN/DM and google/pegasus-xsum for XSUM.

  1. The results on XSUM is perfect. I run the following code and receive the ROUGE score as: {'rouge1': 47.0271, 'rouge2': 24.4924, 'rougeL': 39.2529, 'n_obs': 11333, 'seconds_per_sample': 0.035, 'n_gpus': 3}
python -m torch.distributed.launch --nproc_per_node=3  run_distributed_eval.py \
    --model_name google/pegasus-xsum  \
    --save_dir $OUTPUT_DIR \
    --data_dir $DATA_DIR \
    --bs 64 \
    --fp16
  1. I was expecting similar SOTA performance on CNNDM, so I run the following code and receive: {"n_gpus": 3, "n_obs": 11490, "rouge1": 0.1602, "rouge2": 0.084, "rougeL": 0.1134, "seconds_per_sample": 0.1282}.

(Note: here the batch size is changed due to memory limitation. Although experiments are performed on the same devices, CNN/DM requires more spaces considering the unique feature of dataset itself.)

python -m torch.distributed.launch --nproc_per_node=3  run_distributed_eval.py \
    --model_name google/pegasus-cnn_dailymail  \
    --save_dir $OUTPUT_DIR \
    --data_dir $DATA_DIR \
    --bs 32 \
    --fp16
  1. I look at the generated test_generations.txt file to try to figure out why google/pegasus-cnn_dailymail doesn’t work. Then I found most of lines in test_generations.txt are blank. (Please using the attached image for an example)
image

Expected behavior

It is so wired that google/pegasus-xsum works out perfectly while google/pegasus-cnn_dailymail does not generate summaries successfully. I am confused so I switch the transformers version (4.2.0 and 4.5.1), and I re-run the experiments on different GPUs. This problem exists. Could you please give me any suggestions? Thank you!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
patil-surajcommented, Apr 20, 2021

Hi @chz816

I can reproduce the issue. This is because pegasus doesn’t really work with fp16since its trained with bfloat16, so in most cases, it overflows and returns nan logits. The model works as expected in fp32, so if you run the above command without the --fp16 arg, it should give the expected results.

cc @stas00

1reaction
stas00commented, Apr 20, 2021

I’m able to reproduce this with the “modern” version of the script:

rm -rf output_dir; USE_TF=0 PYTHONPATH=src python examples/seq2seq/run_summarization.py \
--model_name_or_path google/pegasus-cnn_dailymail --do_eval --dataset_name cnn_dailymail \
--dataset_config "3.0.0" --output_dir output_dir \
--per_device_eval_batch_size=16 --predict_with_generate --fp16_full_eval --max_val_samples 10

[...]

***** eval metrics *****
  eval_gen_len              =        9.0
  eval_loss                 =        nan
  eval_mem_cpu_alloc_delta  =      -55MB
  eval_mem_cpu_peaked_delta =       55MB
  eval_mem_gpu_alloc_delta  =     1089MB
  eval_mem_gpu_peaked_delta =     7241MB
  eval_rouge1               =        0.0
  eval_rouge2               =        0.0
  eval_rougeL               =        0.0
  eval_rougeLsum            =        0.0
  eval_runtime              = 0:00:07.71
  eval_samples              =         10
  eval_samples_per_second   =      1.295
  init_mem_cpu_alloc_delta  =        0MB
  init_mem_cpu_peaked_delta =        0MB
  init_mem_gpu_alloc_delta  =        0MB
  init_mem_gpu_peaked_delta =        0MB
Read more comments on GitHub >

github_iconTop Results From Across the Web

```google/pegasus-cnn_dailymail``` generates blank files
I am working on generating summaries using SOTA models. Recently I am using pegasus-related models to generate results.
Read more >
CNN/Daily Mail Dataset - Papers With Code
CNN /Daily Mail is a dataset for text summarization. Human generated abstractive summary bullets were generated from news stories in CNN and Daily...
Read more >
Distillation Knowledge applied on Pegasus for Summarization
The trending technology to face up Summarization - and every task that involves generating text - is the Transformer. This thesis aims to...
Read more >
LATS: Low-Resource Abstractive Text Summarization
We utilise the pre-trained PEGASUS model introduced by Zhang et al. (2019) and the. CNN/DailyMail dataset (Hermann et al., 2015) for further ...
Read more >
(PDF) Correcting Diverse Factual Errors in Abstractive ...
PDF | ive summarization models often generate inconsistent summaries ... Download file PDF ... CNN/DailyMail (Hermann et al.,2015) and XSum.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found