Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Benchmark tests are flaky

See original GitHub issue

🐛 Bug

I have seen the benchmark testing failing due to the following error quite a number of times:

>       assert np.mean(diffs) < max_diff, f"Lightning diff {diffs} was worse than vanilla PT (threshold {max_diff})"
E       AssertionError: Lightning diff [1.30557255e+00 3.60853071e-07] was worse than vanilla PT (threshold 0.0002)
E       assert 0.6527864532870287 < 0.0002
E        +  where 0.6527864532870287 = <function mean at 0x7fcd121d6320>(array([1.30557255e+00, 3.60853071e-07]))
E        +    where <function mean at 0x7fcd121d6320> = np.mean

tests/benchmarks/test_basic_parity.py:38: AssertionError

https://dev.azure.com/PytorchLightning/pytorch-lightning/_build/results?buildId=62256&view=logs&j=46f8844e-d2e7-54ff-5b07-42679d5a20a0&t=d0c9a9e3-f716-53b7-5c53-27dd0366278d

which is configured at:

https://github.com/PyTorchLightning/pytorch-lightning/blob/bc1c8b926c5072f58f42ad4b7413a8ef5c904c85/.azure-pipelines/gpu-tests.yml#L121-L123 https://github.com/PyTorchLightning/pytorch-lightning/blob/bc1c8b926c5072f58f42ad4b7413a8ef5c904c85/.azure-pipelines/gpu-benchmark.yml#L39

To Reproduce

Expected behavior

No error raised so we can always keep our CI green 🟢

Environment

Additional context

cc @carmocca @akihironitta @borda

Issue Analytics

State:
Created a year ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

akihironittacommented, May 24, 2022

Doesn’t the pure torch benchmark that we check against also set benchmark=True?

Thank you @carmocca for asking! No, actually. I checked the benchmarks at the time of around the commit, and it turned out that we had never run the pure PyTorch benchmark as well as the Lightning benchmark with benchmark=True. 00211c1 made only the Lightning benchmark run with benchmark=True (as it became turned on by default), which seemingly led to more memory usage only in the Lightning benchmark.

I will double-check by running both PyTorch and Lightning benchmarks with the flag turned on/off and will update here.

0reactions

akihironittacommented, May 30, 2022

I ran the benchmark with tests.helpers.advanced_models.ParityModuleCIFAR again (but this time with Trainer(benchmark=False) explicitly specified) and confirmed that there’s no difference in the memory usage before and after the commit 00211c1.

I’ll conclude this issue by adding Trainer(benchmark=False) to the existing benchmarks.