question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect `num_warmup_steps` for `lr_scheduler` for multi-gpu training

See original GitHub issue

System Info

- `Accelerate` version: 0.10.0
- Platform: Linux-3.10.0_3-0-0-12-x86_64-with-centos-6.3-Final
- Python version: 3.7.12
- Numpy version: 1.21.6
- PyTorch version (GPU?): 1.7.1 (True)
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: no
        - use_cpu: False
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - main_process_ip: None
        - main_process_port: None
        - main_training_function: main
        - deepspeed_config: {}
        - fsdp_config: {}

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

https://github.com/huggingface/transformers/blob/f2fbe4475386bfcfb3b83d0a3223ba216a3c3a91/examples/pytorch/translation/run_translation_no_trainer.py#L533

# define lr scheduler
lr_scheduler = get_scheduler(
        name="linear",
        optimizer=optimizer,
        num_warmup_steps=args.warmup_steps,
        num_training_steps=args.max_train_steps,
    )

...

if step % args.gradient_accumulation_steps == 0:                    
      optimizer.step()
      lr_scheduler.step() # update lr scheduler every `gradient_accumulation_steps`
      optimizer.zero_grad()

Expected behavior

Is the accelerate consider the num of processes for num_warmup_steps? Suppose we set args.warmup_steps=80 and train on a single 8-gpu machine, the linear learning rate will peak at 10 (i.e., 80/8) rather than expected 80.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:19 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
pacman100commented, Aug 29, 2022

According to the design of accelerate

https://github.com/huggingface/accelerate/blob/d0f5f4a630bda69dcf89cc6d55f93c71f2af7a0d/src/accelerate/scheduler.py#L70

, is it correct to set the warmup_steps as warmup_steps*num_processes, or just do not prepare lr_scheduler?

Hello @cyk1337 , the link you have provided achieves args.max_train_steps // num_gpus because it is steping for num_processes per iteration, i.e., num_gpus times per iteration.

I didn’t understand what the query was in case of not preparing lr_scheduler. As per the original question, it is logical to have warmup steps to be reduced in a multi-device scenario.

0reactions
github-actions[bot]commented, Oct 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I set the steps_per_epoch parameter of a lr scheduler ...
I've tried manually dividing the steps_per_epoch of the OneCycleLR scheduler by the number of GPUs when training on a multi-GPU machine.
Read more >
Optimization - Hugging Face
Training without LR warmup or clip_threshold is not recommended. ... Creates an optimizer with a learning rate schedule using a warmup phase followed...
Read more >
13.5. Training on Multiple GPUs - Dive into Deep Learning
So far we discussed how to train models efficiently on CPUs and GPUs. We even showed how deep learning frameworks allow one to...
Read more >
The importance of hyperparameter tuning for scaling deep ...
Parallel processing with multiple GPUs is an important step in scaling training of deep models. In each training iteration, ...
Read more >
How to Choose a Learning Rate Scheduler for Neural Networks
In this article, we will explore the learning rate, and explain why it's crucial to schedule our learning rate during model training.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found