question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug in gradient accumulation example

See original GitHub issue

https://github.com/huggingface/accelerate/blob/450d51ce0191020408bd3481bde85fe1dabaf289/examples/by_feature/gradient_accumulation.py#L160

Shouldn’t this condition be step % gradient_accumulation_steps != 0 because we want to avoid gradient averaging at each step other than gradient_accumulation_steps step?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

3reactions
nilesh2797commented, Jun 11, 2022

Let’s say gradient_accumulation_steps = 5, then In the current code optimizer.step() is getting called in step 1,2,3,4,6,7,8,9,11,... instead it should be getting called in step 0,5,10,15,20,... right? Basically it’s in the opposite logical block

0reactions
nilesh2797commented, Jul 22, 2022

Thanks for the fix!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Trying to accumulate gradients in Pytorch, but getting ...
Here's an example which accumulates the gradient, not the loss: model = nn. ... Since your gradients will be accumulated twice.
Read more >
Gradient accumulation: should I duplicate data? - Transformers
Hello! I am using gradient accumulation to simulate bigger batches when fine-tuning. However, I remember to have seen some notebooks in the ...
Read more >
Understanding Clouds from Satellite Images | Kaggle
A trick to use bigger batches for training: gradient accumulation ... In most cases (not all, for example in GANs) using bigger batches...
Read more >
[D] Does gradient accumulation achieve anything different ...
I'm trying to understand the practical justification for gradient accumulation (ie. Running with an effectively larger batch size by summing ...
Read more >
gradient-accumulator - PyPI
There is also an example of how to use gradient accumulation with mixed precision here. Adaptive gradient clipping. There has also been added...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found