question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Benchmark tests are flaky

See original GitHub issue

🐛 Bug

I have seen the benchmark testing failing due to the following error quite a number of times:

>       assert np.mean(diffs) < max_diff, f"Lightning diff {diffs} was worse than vanilla PT (threshold {max_diff})"
E       AssertionError: Lightning diff [1.30557255e+00 3.60853071e-07] was worse than vanilla PT (threshold 0.0002)
E       assert 0.6527864532870287 < 0.0002
E        +  where 0.6527864532870287 = <function mean at 0x7fcd121d6320>(array([1.30557255e+00, 3.60853071e-07]))
E        +    where <function mean at 0x7fcd121d6320> = np.mean

tests/benchmarks/test_basic_parity.py:38: AssertionError

https://dev.azure.com/PytorchLightning/pytorch-lightning/_build/results?buildId=62256&view=logs&j=46f8844e-d2e7-54ff-5b07-42679d5a20a0&t=d0c9a9e3-f716-53b7-5c53-27dd0366278d

which is configured at:

https://github.com/PyTorchLightning/pytorch-lightning/blob/bc1c8b926c5072f58f42ad4b7413a8ef5c904c85/.azure-pipelines/gpu-tests.yml#L121-L123 https://github.com/PyTorchLightning/pytorch-lightning/blob/bc1c8b926c5072f58f42ad4b7413a8ef5c904c85/.azure-pipelines/gpu-benchmark.yml#L39

To Reproduce

Expected behavior

No error raised so we can always keep our CI green 🟢

Environment

Additional context

cc @carmocca @akihironitta @borda

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
akihironittacommented, May 24, 2022

Doesn’t the pure torch benchmark that we check against also set benchmark=True?

Thank you @carmocca for asking! No, actually. I checked the benchmarks at the time of around the commit, and it turned out that we had never run the pure PyTorch benchmark as well as the Lightning benchmark with benchmark=True. 00211c1 made only the Lightning benchmark run with benchmark=True (as it became turned on by default), which seemingly led to more memory usage only in the Lightning benchmark.

I will double-check by running both PyTorch and Lightning benchmarks with the flag turned on/off and will update here.

0reactions
akihironittacommented, May 30, 2022

I ran the benchmark with tests.helpers.advanced_models.ParityModuleCIFAR again (but this time with Trainer(benchmark=False) explicitly specified) and confirmed that there’s no difference in the memory usage before and after the commit 00211c1.

I’ll conclude this issue by adding Trainer(benchmark=False) to the existing benchmarks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What are Flaky Tests? | TeamCity CI/CD Guide - JetBrains
Flaky tests are tests that return new results, despite there being no changes to code. Find out why flaky tests matter and how...
Read more >
Flaky Tests: Getting Rid Of A Living Nightmare In Testing
The Science Of Flaky Tests #​​ A flaky test is one that fails to produce the same result each time the same analysis...
Read more >
Flaky tests - GitLab Docs
It's a test that sometimes fails, but if you retry it enough times, it passes, eventually. What are the potential cause for a...
Read more >
A Pragmatist's Guide to Flaky Test Management
A test is “flaky” whenever it can produce both “passing” and “failing” results for the same code. Test flakiness is a bit like...
Read more >
How to reduce flaky test failures - CircleCI
Flaky tests result mostly from insufficient test data, narrow test environment scope, and complex technology. Some other factors that play a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found