Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug in gradient accumulation example

See original GitHub issue

https://github.com/huggingface/accelerate/blob/450d51ce0191020408bd3481bde85fe1dabaf289/examples/by_feature/gradient_accumulation.py#L160

Shouldn’t this condition be step % gradient_accumulation_steps != 0 because we want to avoid gradient averaging at each step other than gradient_accumulation_steps step?

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

3reactions

nilesh2797commented, Jun 11, 2022

Let’s say gradient_accumulation_steps = 5, then In the current code optimizer.step() is getting called in step 1,2,3,4,6,7,8,9,11,... instead it should be getting called in step 0,5,10,15,20,... right? Basically it’s in the opposite logical block

0reactions

nilesh2797commented, Jul 22, 2022

Thanks for the fix!

Read more comments on GitHub >

Top Results From Across the Web

Trying to accumulate gradients in Pytorch, but getting ...

Here's an example which accumulates the gradient, not the loss: model = nn. ... Since your gradients will be accumulated twice.

Gradient accumulation: should I duplicate data? - Transformers

Hello! I am using gradient accumulation to simulate bigger batches when fine-tuning. However, I remember to have seen some notebooks in the ...

Understanding Clouds from Satellite Images | Kaggle

A trick to use bigger batches for training: gradient accumulation ... In most cases (not all, for example in GANs) using bigger batches...

[D] Does gradient accumulation achieve anything different ...

I'm trying to understand the practical justification for gradient accumulation (ie. Running with an effectively larger batch size by summing ...

gradient-accumulator - PyPI

There is also an example of how to use gradient accumulation with mixed precision here. Adaptive gradient clipping. There has also been added...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Training with multiple TPU cores is incorrect and slower

Adding accelerate to `transformer` mdoels