[s2s] Trainer vs PTL timings
See original GitHub issueFor the following two commands,
- PTL finishes: 2.01 it/s, ~3H, 21.32 Rouge
- Trainer: 1.0 it/s, roughly 5.5H, 21.36 Rouge
I wanted to report this so I don’t lose track of it. Looked at the code, and don’t see any obvious issue, besides that the slowdown is suspiciously close to 2x.
Any idea @patil-suraj ?
PTL Command
export BS=32
export GAS=1
python finetune.py \
--learning_rate=3e-5 \
--fp16 \
--gpus 1 \
--do_train \
--do_predict \
--val_check_interval 0.25 \
--n_val 500 \
--num_train_epochs 2 \
--freeze_encoder --freeze_embeds --data_dir cnn_dm \
--max_target_length 142 --val_max_target_length=142 \
--train_batch_size=$BS --eval_batch_size=$BS --gradient_accumulation_steps=$GAS \
--model_name_or_path sshleifer/student_cnn_12_6 \
--tokenizer_name facebook/bart-large \
--warmup_steps 500 \
--output_dir distilbart-cnn-12-6
Trainer command
same as builtin_trainer/train_distilbart_cnn.sh
:
export BS=32
export GAS=1
export m=sshleifer/student_cnn_12_6
export tok=facebook/bart-large
export MAX_TGT_LEN=142
python finetune_trainer.py \
--model_name_or_path $m --tokenizer_name $tok \
--data_dir cnn_dm \
--output_dir distilbart-cnn-12-6-trainer --overwrite_output_dir \
--learning_rate=3e-5 --sortish-sampler \
--warmup_steps 500 \
--fp16 \
--n_val 500 \
--gradient_accumulation_steps=$GAS \
--per_device_train_batch_size=$BS --per_device_eval_batch_size=$BS \
--freeze_encoder --freeze_embeds \
--num_train_epochs=2 \
--save_steps 3000 --eval_steps 3000 \
--logging_first_step \
--max_target_length 142 --val_max_target_length $MAX_TGT_LEN --test_max_target_length $MAX_TGT_LEN \
--do_train --do_eval --do_predict --evaluate_during_training \
--predict_with_generate --sortish_sampler
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
PTL - UTMB®
The 14th edition of the PTL® will start on Monday, August 22, 2022 at 8:00 am, at Place Triangle de l'Amitié in Chamonix....
Read more >Untitled
Net full form, Ombladon si uzzi noi vs ei download, Himoto front shock tower, ... Rakki show timings, Roan bronstein photo, How could...
Read more >Tag Archives: 6 week shortcut to shred - Vine Street Love
When I wrote last, I talked about how the boyfriend (Tommy, TB, T Time, or one ... Next week will be the final...
Read more >THE STATE OF THE WORLD'S CHILDREN 1990 - UNICEF
lion arc"" v...l that any oignificam change eould ... gency, in times of peace and in nmes of war, in ... degree on...
Read more >Billboard - Mar 10, 1979 - Google Books Result
No clwi-Ee for Box number, -POSITION OPE\" i- S2S-in advance-for one time. ... STATIONS HIRING NA [itinwuk- [ATl Ptl's Sales N*-w.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’m also experiencing slow down on TPU’s, didn’t run the new changes on GPU yet. I"ll investigate this
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.