question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tqdm progress bar in v1.6 is slower than v1.5

See original GitHub issue

🐛 Bug

Training the boring model is slow with 1.6.0 (and master) compared to 1.5.10 with tqdm progress bar enabled.

time: 9.177419368177652  # 1.5.10
time: 12.62373811379075  # 1.6.0

To Reproduce

from time import monotonic
from pytorch_lightning import Trainer
from tests.helpers import BoringModel

def main():
    model = BoringModel()
    trainer = Trainer(
        max_epochs=100,
        enable_model_summary=False,
        enable_checkpointing=False,
        logger=False,
        # profiler="advanced",
    )
    t0 = monotonic()
    trainer.fit(model)
    print("time:", monotonic() - t0)

if __name__ == "__main__":
    main()

Expected behavior

No regression in speed.

Environment

  • PyTorch Lightning Version (e.g., 1.5.0): 1.5.10 vs. 1.6.0
  • PyTorch Version (e.g., 1.10): 1.11.0 and 1.8.2
  • Python version (e.g., 3.9): 3.9
  • OS (e.g., Linux): Linux
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

The following benchmark result suggests that 8a549a55 (#11213) made the above code somewhat slow. Profiling the above script makes me think that the cause of the slowness is that #11213 added the expensive bar.refresh() call which still exists in master here: https://github.com/PyTorchLightning/pytorch-lightning/blob/a6e9bc2943bf2c82036e31a4948bd8caa54957ee/pytorch_lightning/callbacks/progress/tqdm_progress.py#L390 benchmark result: https://www.akihironitta.com/lightning-benchmarks/#bench_fit.AnotherBoringBenchmark.time_it?branch=master&branch=release%2F1.5.x&pytorch=1.11&pytorch=1.8.2&torchvision=&pip%2Bprotobuf=3.20.1


Update: Just uncommented the refresh() call in https://github.com/PyTorchLightning/pytorch-lightning/tree/perf-tqdm and confirmed that it improves the performance to around the level of 1.5.10.


However, I’m not sure if we really need to fix this because:

  • users will disable the progress bar anyway when they need maximum performance, and they’ll see progress via their logger usually
  • (and bar.refresh() makes the progress bar look smoother.)

cc @tchaton @rohitgr7 @awaelchli @borda @akihironitta

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
akihironittacommented, Jun 3, 2022

Here’s the result from the following script with max_epochs=1000.

Run this script
# Apply the patch when measuring with rich
from time import monotonic
from pytorch_lightning import Trainer
+from pytorch_lightning.callbacks import RichProgressBar
from tests.helpers import BoringModel


def main():
    model = BoringModel()
    trainer = Trainer(
        max_epochs=1000,
        enable_model_summary=False,
        enable_checkpointing=False,
        logger=False,
+        callbacks=RichProgressBar(),
    )
    t0 = monotonic()
    trainer.fit(model)
    print("time:", monotonic() - t0)


if __name__ == "__main__":
    main()
version tqdm rich
1.5.10 88.84 87.10
1.6.0 118.52 528.69
master 107.86 519.56
perf-tqdm 82.57 n/a

Haven’t looked into why rich progress bar slows down significantly from 1.5.10 to 1.6.0 but will follow up.


Here’s the number of refresh calls.

Run this command

The same script but with max_epochs=1.

$ for i in "1.5.10" "1.6.0" "master" "perf-tqdm"; do git checkout $i; python -m cProfile main.py 2> /dev/null |grep refresh; done
version tqdm
1.5.10 75
1.6.0 267
master 271
perf-tqdm 77
2reactions
carmoccacommented, Jun 30, 2022

Looking at the tqdm.update implementation, they have logic to automatically refresh the bar, so perhaps it is a good idea that we remove our manual refresh call.

https://github.com/tqdm/tqdm/blob/4f208e72552c4d916aa4fe6a955349ee8b2ed353/tqdm/std.py#L1256

Read more comments on GitHub >

github_iconTop Results From Across the Web

tqdm progress bar in v1.6 is slower than v1.5 #13179 - GitHub
Bug. Training the boring model is slow with 1.6.0 (and master) compared to 1.5.10 with tqdm progress bar enabled.
Read more >
progress bar slows down code by factor of 5 using tqdm and ...
If your progress is erratic with both fast and slow iterations (network, skipping items, etc) you should set miniters=1.
Read more >
Customize the progress bar - PyTorch Lightning
The TQDMProgressBar uses the tqdm library internally and is the default progress bar used by Lightning. It prints to stdout and shows up...
Read more >
tqdm Tutorial - Kaggle
It is typically used to display the progress of a lengthy operation and it provides a visual cue that processing is underway. A...
Read more >
pytorch 1.7.0 is much slower than pytorch 1.3.1 - vision
... even up to 6x time slower! (2080Ti pytorch1.3.1 with nvidia-apex: 1.6it/s ... the only code I added is tqdm to show the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found