question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPU and batch size setting

See original GitHub issue

I am using the training command in readme python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true my question is how to set GPU and batch size, it said this command is 4 GPU and 128 batch size, but i didn’t see it in this command, neither in the code Thx

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
cdhxcommented, Mar 17, 2022

Hi,

I think it may be because in our command, there is a --nproc_per_node 4 which should be corresponded to the number of GPUs?

Hope this information helpful! Thanks

It works, thank for your reply this days.

1reaction
Timothyxxxcommented, Mar 17, 2022

Hi,

I think it may be because in our command, there is a --nproc_per_node 4 which should be corresponded to the number of GPUs?

Hope this information helpful! Thanks

Read more comments on GitHub >

github_iconTop Results From Across the Web

Batch size and GPU memory limitations in neural networks
The batch size is the number of samples (e.g. images) used to train a model before updating its trainable model variables — the...
Read more >
How to maximize GPU utilization by finding the right batch size
Increasing batch size is a straightforward technique to boost GPU usage, though it is not always successful. The gradient of the batch size...
Read more >
Effect of batch size and number of GPUs on model accuracy
The batch size doesn't matter to performance too much, as long as you set a reasonable batch size (16+) and keep the iterations...
Read more >
tensorflow - How to select batch size automatically to fit GPU?
PyTorch Lightning recently added a feature called "auto batch size", especially for this! It computes the max batch size that can fit into ......
Read more >
What is relationship between batch size and GPU processor ...
The larger the batch, the more effectively the GPU can run. If the batch size is very small, there is relatively a lot...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found