Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

google/pegasus-cnn_dailymail generates blank file

See original GitHub issue

Environment info

transformers version: 4.2.0 and 4.5.1
Platform: linux
Python version: 3.6
PyTorch version (GPU?): 1.7.1
Tensorflow version (GPU?): NA
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes (and I also try to not use distributed but problem exists)

Who can help

@patrickvonplaten, @patil-suraj

Information

Model I am using (Bert, XLNet …): google/pegasus-cnn_dailymail

The problem arises when using:

the official example scripts: run_distributed_eval.py from https://github.com/huggingface/transformers/tree/master/examples/legacy/seq2seq
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: summarization with ROUGE
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

I am trying to generate the summaries from Pegasus on CNN/DM and XSUM datasets. I use the same dataset shared by HuggingFace (from README.md in https://github.com/huggingface/transformers/tree/master/examples/legacy/seq2seq). My experiments are run on 3 V100 GPUs. I use google/pegasus-cnn_dailymail for CNN/DM and google/pegasus-xsum for XSUM.

The results on XSUM is perfect. I run the following code and receive the ROUGE score as: {'rouge1': 47.0271, 'rouge2': 24.4924, 'rougeL': 39.2529, 'n_obs': 11333, 'seconds_per_sample': 0.035, 'n_gpus': 3}

python -m torch.distributed.launch --nproc_per_node=3  run_distributed_eval.py \
    --model_name google/pegasus-xsum  \
    --save_dir $OUTPUT_DIR \
    --data_dir $DATA_DIR \
    --bs 64 \
    --fp16

I was expecting similar SOTA performance on CNNDM, so I run the following code and receive: {"n_gpus": 3, "n_obs": 11490, "rouge1": 0.1602, "rouge2": 0.084, "rougeL": 0.1134, "seconds_per_sample": 0.1282}.

(Note: here the batch size is changed due to memory limitation. Although experiments are performed on the same devices, CNN/DM requires more spaces considering the unique feature of dataset itself.)

python -m torch.distributed.launch --nproc_per_node=3  run_distributed_eval.py \
    --model_name google/pegasus-cnn_dailymail  \
    --save_dir $OUTPUT_DIR \
    --data_dir $DATA_DIR \
    --bs 32 \
    --fp16

I look at the generated test_generations.txt file to try to figure out why google/pegasus-cnn_dailymail doesn’t work. Then I found most of lines in test_generations.txt are blank. (Please using the attached image for an example)

Expected behavior

It is so wired that google/pegasus-xsum works out perfectly while google/pegasus-cnn_dailymail does not generate summaries successfully. I am confused so I switch the transformers version (4.2.0 and 4.5.1), and I re-run the experiments on different GPUs. This problem exists. Could you please give me any suggestions? Thank you!

Issue Analytics

State:
Created 2 years ago
Comments:10 (8 by maintainers)

Top GitHub Comments

2reactions

patil-surajcommented, Apr 20, 2021

Hi @chz816

I can reproduce the issue. This is because pegasus doesn’t really work with fp16since its trained with bfloat16, so in most cases, it overflows and returns nan logits. The model works as expected in fp32, so if you run the above command without the --fp16 arg, it should give the expected results.

cc @stas00

1reaction

stas00commented, Apr 20, 2021

I’m able to reproduce this with the “modern” version of the script:

rm -rf output_dir; USE_TF=0 PYTHONPATH=src python examples/seq2seq/run_summarization.py \
--model_name_or_path google/pegasus-cnn_dailymail --do_eval --dataset_name cnn_dailymail \
--dataset_config "3.0.0" --output_dir output_dir \
--per_device_eval_batch_size=16 --predict_with_generate --fp16_full_eval --max_val_samples 10

[...]

***** eval metrics *****
  eval_gen_len              =        9.0
  eval_loss                 =        nan
  eval_mem_cpu_alloc_delta  =      -55MB
  eval_mem_cpu_peaked_delta =       55MB
  eval_mem_gpu_alloc_delta  =     1089MB
  eval_mem_gpu_peaked_delta =     7241MB
  eval_rouge1               =        0.0
  eval_rouge2               =        0.0
  eval_rougeL               =        0.0
  eval_rougeLsum            =        0.0
  eval_runtime              = 0:00:07.71
  eval_samples              =         10
  eval_samples_per_second   =      1.295
  init_mem_cpu_alloc_delta  =        0MB
  init_mem_cpu_peaked_delta =        0MB
  init_mem_gpu_alloc_delta  =        0MB
  init_mem_gpu_peaked_delta =        0MB