Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Does (horovod + learning rate decay) work properly?

See original GitHub issue

Hi, I’m using Horovod on 3 GPUs(in a single machine) with learning rate decayReduceLROnPlateau. But I found something strange while looking at the printed log. It seems that the learning rate for each process(GPU) is not synchronized.

Actually, this is my first time using Horovod and I don’t know what’s going on inside Horovod. I would appreciate it if you could let me know if I am mistaken!

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

janEbertcommented, May 31, 2021

Hey, thanks for the report, that’s a great catch! Passing avg_loss instead of loss into the scheduler should already fix this! Sorry for the issue. 😃 That way, all processes use the same loss, so the scheduler stays the same across all workers.

1reaction

janEbertcommented, May 31, 2021

Yeah, exactly! Sorry I wasn’t clearer in my description. Thank you!

Top Results From Across the Web

AdaSum with Horovod

Scaling DNN training to many GPUs always comes at a convergence degradation. This is because with larger batch sizes, gradients are averaged and...

Why should we scale the learning rate? · Issue #384 - GitHub

The idea is to scale the learning rate linearly with the batch size to preserve the number of epochs needed for the model...

Distributed Deep Learning with Horovod | NVIDIA

How does Deep Learning training work? ... import horovod.tensorflow as hvd ... Google published a paper “Don't Decay the Learning Rate, Increase the....

Why is your Horovod slower than the usual?

This article discusses what can be done to train faster with Horovod and some common bottlenecks that could cause a slow down on...

Scaling Deep Learning Training - Cray User Group

Training with large learning rates is not stable in the initial stages of ... Linear scaling of learning-rate (N * η) followed by...