question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[s2s] Trainer vs PTL timings

See original GitHub issue

For the following two commands,

  • PTL finishes: 2.01 it/s, ~3H, 21.32 Rouge
  • Trainer: 1.0 it/s, roughly 5.5H, 21.36 Rouge

I wanted to report this so I don’t lose track of it. Looked at the code, and don’t see any obvious issue, besides that the slowdown is suspiciously close to 2x.

Any idea @patil-suraj ?

PTL Command


export BS=32
export GAS=1

python finetune.py \
    --learning_rate=3e-5 \
    --fp16 \
    --gpus 1 \
    --do_train \
    --do_predict \
    --val_check_interval 0.25 \
    --n_val 500 \
    --num_train_epochs 2 \
    --freeze_encoder --freeze_embeds --data_dir cnn_dm \
    --max_target_length 142 --val_max_target_length=142 \
    --train_batch_size=$BS --eval_batch_size=$BS --gradient_accumulation_steps=$GAS \
    --model_name_or_path sshleifer/student_cnn_12_6 \
    --tokenizer_name facebook/bart-large \
    --warmup_steps 500 \
    --output_dir distilbart-cnn-12-6

Trainer command

same as builtin_trainer/train_distilbart_cnn.sh:

export BS=32
export GAS=1
export m=sshleifer/student_cnn_12_6
export tok=facebook/bart-large
export MAX_TGT_LEN=142

python finetune_trainer.py \
    --model_name_or_path $m --tokenizer_name $tok \
    --data_dir cnn_dm \
    --output_dir distilbart-cnn-12-6-trainer --overwrite_output_dir \
    --learning_rate=3e-5 --sortish-sampler \
    --warmup_steps 500 \
    --fp16 \
    --n_val 500 \
    --gradient_accumulation_steps=$GAS \
    --per_device_train_batch_size=$BS --per_device_eval_batch_size=$BS \
    --freeze_encoder --freeze_embeds \
    --num_train_epochs=2 \
    --save_steps 3000 --eval_steps 3000 \
    --logging_first_step \
    --max_target_length 142 --val_max_target_length $MAX_TGT_LEN --test_max_target_length $MAX_TGT_LEN \
    --do_train --do_eval --do_predict --evaluate_during_training \
    --predict_with_generate --sortish_sampler

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
patil-surajcommented, Oct 29, 2020

I’m also experiencing slow down on TPU’s, didn’t run the new changes on GPU yet. I"ll investigate this

0reactions
stale[bot]commented, Jan 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PTL - UTMB®
The 14th edition of the PTL® will start on Monday, August 22, 2022 at 8:00 am, at Place Triangle de l'Amitié in Chamonix....
Read more >
Untitled
Net full form, Ombladon si uzzi noi vs ei download, Himoto front shock tower, ... Rakki show timings, Roan bronstein photo, How could...
Read more >
Tag Archives: 6 week shortcut to shred - Vine Street Love
When I wrote last, I talked about how the boyfriend (Tommy, TB, T Time, or one ... Next week will be the final...
Read more >
THE STATE OF THE WORLD'S CHILDREN 1990 - UNICEF
lion arc"" v...l that any oignificam change eould ... gency, in times of peace and in nmes of war, in ... degree on...
Read more >
Billboard - Mar 10, 1979 - Google Books Result
No clwi-Ee for Box number, -POSITION OPE\" i- S2S-in advance-for one time. ... STATIONS HIRING NA [itinwuk- [ATl Ptl's Sales N*-w.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found