[Feature Request] EarlyStopping logging on rank 0 only
See original GitHub issue🚀 Feature
Toggle switch to turn off EarlyStopping logging for processes other than rank 0
Motivation
EarlyStopping logging can be a bit spammy when viewing aggregate logs across all processes. For example, with my custom CloudWatch logger:
xnpww4j62d-algo-1-vr8o9 | 14:17:49 [INFO] Epoch 9: [ Training | 100% iter# 49/49 19.28 batches/s ] train/loss_step=0.764418, train/loss_epoch=0.773, train/acc=0.68356
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] Epoch 9: [ Validation | 100% iter# 10/10 2.34 batches/s ] val/loss_step=1.253475, val/loss_epoch=1.278802, val/acc=0.6107
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 0] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 2] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 1] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 3] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 4] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 5] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 6] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 7] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:18:20 [INFO] Epoch 14: [ Training | 100% iter# 49/49 18.94 batches/s ] train/loss_step=0.611876, train/loss_epoch=0.55, train/acc=0.80096
xnpww4j62d-algo-1-vr8o9 | 14:18:26 [INFO] Epoch 14: [ Validation | 100% iter# 10/10 2.29 batches/s ] val/loss_step=0.748429, val/loss_epoch=0.828285, val/acc=0.726
Pitch
It would be nice if we could turn off printing of this message on processes other than rank 0. I understand that this is actually useful to monitor in some cases, so maybe this toggle could be set to False by default.
Alternatives
Custom EarlyStopping callback?
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:5 (4 by maintainers)
Top Results From Across the Web
[Feature Request] EarlyStopping logging on rank 0 only #13162
Motivation. EarlyStopping logging can be a bit spammy when viewing aggregate logs across all processes. For example, with my custom CloudWatch ...
Read more >EarlyStopping — PyTorch Lightning 1.8.5.post0 documentation
log_rank_zero_only ( bool ) – When set True , logs the status of the early stopping callback only for rank 0 process. Raises....
Read more >tf.keras.callbacks.EarlyStopping | TensorFlow v2.11.0
The quantity to be monitored needs to be available in logs dict. ... Mode 0 is silent, and mode 1 displays messages when...
Read more >Trainer - Hugging Face
Log metrics in a specially formatted way. Under distributed environment this is done only for a process with rank 0. Notes on memory...
Read more >Study Duration for Clinical Trials with Survival Response and ...
Halpern and Brown (1987) use a simulation based on the log-rank test and the modified ... Group sequential tests allow early stopping of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think we can add this flag. It’s useful for metrics logged with
sync_dist=True
.The relevant piece of code is here:
https://github.com/PyTorchLightning/pytorch-lightning/blob/dd475183227644a8d22dca3deb18c99fb0a9b2c4/pytorch_lightning/callbacks/early_stopping.py#L256-L261
Hi @carmocca! I would like to take this up.