Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tqdm progress bar in v1.6 is slower than v1.5

See original GitHub issue

🐛 Bug

Training the boring model is slow with 1.6.0 (and master) compared to 1.5.10 with tqdm progress bar enabled.

time: 9.177419368177652  # 1.5.10
time: 12.62373811379075  # 1.6.0

To Reproduce

from time import monotonic
from pytorch_lightning import Trainer
from tests.helpers import BoringModel

def main():
    model = BoringModel()
    trainer = Trainer(
        max_epochs=100,
        enable_model_summary=False,
        enable_checkpointing=False,
        logger=False,
        # profiler="advanced",
    )
    t0 = monotonic()
    trainer.fit(model)
    print("time:", monotonic() - t0)

if __name__ == "__main__":
    main()

Expected behavior

No regression in speed.

Environment

PyTorch Lightning Version (e.g., 1.5.0): 1.5.10 vs. 1.6.0
PyTorch Version (e.g., 1.10): 1.11.0 and 1.8.2
Python version (e.g., 3.9): 3.9
OS (e.g., Linux): Linux
CUDA/cuDNN version:
GPU models and configuration:
How you installed PyTorch (conda, pip, source):
If compiling from source, the output of torch.__config__.show():
Any other relevant information:

Additional context

The following benchmark result suggests that 8a549a55 (#11213) made the above code somewhat slow. Profiling the above script makes me think that the cause of the slowness is that #11213 added the expensive bar.refresh() call which still exists in master here: https://github.com/PyTorchLightning/pytorch-lightning/blob/a6e9bc2943bf2c82036e31a4948bd8caa54957ee/pytorch_lightning/callbacks/progress/tqdm_progress.py#L390 benchmark result: https://www.akihironitta.com/lightning-benchmarks/#bench_fit.AnotherBoringBenchmark.time_it?branch=master&branch=release%2F1.5.x&pytorch=1.11&pytorch=1.8.2&torchvision=&pip%2Bprotobuf=3.20.1

Update: Just uncommented the refresh() call in https://github.com/PyTorchLightning/pytorch-lightning/tree/perf-tqdm and confirmed that it improves the performance to around the level of 1.5.10.

However, I’m not sure if we really need to fix this because:

users will disable the progress bar anyway when they need maximum performance, and they’ll see progress via their logger usually
(and bar.refresh() makes the progress bar look smoother.)

cc @tchaton @rohitgr7 @awaelchli @borda @akihironitta

Issue Analytics

State:
Created a year ago
Comments:10 (10 by maintainers)

Top GitHub Comments

3reactions

akihironittacommented, Jun 3, 2022

Here’s the result from the following script with max_epochs=1000.

Run this script

# Apply the patch when measuring with rich
from time import monotonic
from pytorch_lightning import Trainer
+from pytorch_lightning.callbacks import RichProgressBar
from tests.helpers import BoringModel


def main():
    model = BoringModel()
    trainer = Trainer(
        max_epochs=1000,
        enable_model_summary=False,
        enable_checkpointing=False,
        logger=False,
+        callbacks=RichProgressBar(),
    )
    t0 = monotonic()
    trainer.fit(model)
    print("time:", monotonic() - t0)


if __name__ == "__main__":
    main()

version	tqdm	rich
1.5.10	88.84	87.10
1.6.0	118.52	528.69
master	107.86	519.56
perf-tqdm	82.57	n/a

Haven’t looked into why rich progress bar slows down significantly from 1.5.10 to 1.6.0 but will follow up.

Here’s the number of refresh calls.

Run this command

The same script but with max_epochs=1.

$ for i in "1.5.10" "1.6.0" "master" "perf-tqdm"; do git checkout $i; python -m cProfile main.py 2> /dev/null |grep refresh; done

version	tqdm
1.5.10	75
1.6.0	267
master	271
perf-tqdm	77

2reactions

carmoccacommented, Jun 30, 2022

Looking at the tqdm.update implementation, they have logic to automatically refresh the bar, so perhaps it is a good idea that we remove our manual refresh call.

https://github.com/tqdm/tqdm/blob/4f208e72552c4d916aa4fe6a955349ee8b2ed353/tqdm/std.py#L1256