question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to set num_training_steps in lr_scheduler properly

See original GitHub issue

Usually I call something like this to set the scheduler

from transformers import get_linear_schedule_with_warmup
scheduler = get_linear_schedule_with_warmup(
        optimizer, num_warmup_steps=warmup_step, num_training_steps=num_training_steps
    )

And the num_training_steps usually equals to

t_total = int(len(train_dataloader) * num_epochs) ## len(train_dataloader) = num_batches

If I use accelerator, should I change the num_training_steps to something like this? And how to understand this:

I believe the len(train_dataloader) = batch size at each device

t_total = int(len(train_dataloader) * num_epochs // accelerator.num_processes)

All the above operations are before accelerator.prepare

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
muellerzrcommented, Aug 16, 2022

@allanj everything and anything that has to do with gradient accumulation accelerate will now handle for you. Just pass in the gradient_accumulation_steps arg and make no changes to your code, as if you weren’t using gradient accumulation at all 😄

2reactions
takiholadicommented, Aug 16, 2022

Hi @muellerzr , I don’t quite understand, I thought the sample sent by @takiholadi has no accelerator.num_processed. Why we should revise that

Since: https://github.com/huggingface/accelerate/blob/b0f8189d34fa42821ca041e2cba161db864c76b5/src/accelerate/scheduler.py#L31-L32

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to change optimizer and lr scheduler in the middle of ...
I need to train a model multi-phases with a pre-trained backbone. For the first 10 epoch, I want to have the backbone frozen...
Read more >
How to change optimizer and lr scheduler in the ... - GitHub
you can try the on_train_epoch_start method in the callback and reconfigure the optimizers and schedulers the way you want. Here are some links ......
Read more >
PyTorch LR Scheduler - Adjust The Learning Rate ... - YouTube
In this PyTorch Tutorial we learn how to use a Learning Rate ( LR) Scheduler to adjust the LR during training. Models often...
Read more >
StepLR — PyTorch 1.13 documentation
When last_epoch=-1, sets initial lr as lr. Parameters: optimizer (Optimizer) – Wrapped optimizer. step_size (int) – Period of learning rate decay.
Read more >
Guide to Pytorch Learning Rate Scheduling | Kaggle
Sets the learning rate of each parameter group to the initial lr times a given ... Learning Rate = ",optimizer.param_groups[0]["lr"]) scheduler.step() ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found