Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to set num_training_steps in lr_scheduler properly

See original GitHub issue

Usually I call something like this to set the scheduler

from transformers import get_linear_schedule_with_warmup
scheduler = get_linear_schedule_with_warmup(
        optimizer, num_warmup_steps=warmup_step, num_training_steps=num_training_steps
    )

And the num_training_steps usually equals to

t_total = int(len(train_dataloader) * num_epochs) ## len(train_dataloader) = num_batches

If I use accelerator, should I change the num_training_steps to something like this? And how to understand this:

I believe the len(train_dataloader) = batch size at each device

t_total = int(len(train_dataloader) * num_epochs // accelerator.num_processes)

All the above operations are before accelerator.prepare

Issue Analytics

State:
Created a year ago
Comments:9 (1 by maintainers)

Top GitHub Comments

2reactions

muellerzrcommented, Aug 16, 2022

@allanj everything and anything that has to do with gradient accumulation accelerate will now handle for you. Just pass in the gradient_accumulation_steps arg and make no changes to your code, as if you weren’t using gradient accumulation at all 😄

2reactions

takiholadicommented, Aug 16, 2022

Hi @muellerzr , I don’t quite understand, I thought the sample sent by @takiholadi has no accelerator.num_processed. Why we should revise that

Since: https://github.com/huggingface/accelerate/blob/b0f8189d34fa42821ca041e2cba161db864c76b5/src/accelerate/scheduler.py#L31-L32

Top Results From Across the Web

How to change optimizer and lr scheduler in the middle of ...

I need to train a model multi-phases with a pre-trained backbone. For the first 10 epoch, I want to have the backbone frozen...

How to change optimizer and lr scheduler in the ... - GitHub

you can try the on_train_epoch_start method in the callback and reconfigure the optimizers and schedulers the way you want. Here are some links ......

PyTorch LR Scheduler - Adjust The Learning Rate ... - YouTube

In this PyTorch Tutorial we learn how to use a Learning Rate ( LR) Scheduler to adjust the LR during training. Models often...

StepLR — PyTorch 1.13 documentation

When last_epoch=-1, sets initial lr as lr. Parameters: optimizer (Optimizer) – Wrapped optimizer. step_size (int) – Period of learning rate decay.

Guide to Pytorch Learning Rate Scheduling | Kaggle

Sets the learning rate of each parameter group to the initial lr times a given ... Learning Rate = ",optimizer.param_groups[0]["lr"]) scheduler.step() ...