GPU and batch size setting
See original GitHub issueI am using the training command in readme
python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true
my question is how to set GPU and batch size, it said this command is 4 GPU and 128 batch size, but i didn’t see it in this command, neither in the code
Thx
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:11 (5 by maintainers)
It works, thank for your reply this days.
Hi,
I think it may be because in our command, there is a
--nproc_per_node 4
which should be corresponded to the number of GPUs?Hope this information helpful! Thanks