question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flaky tests and circuit operation selection

See original GitHub issue

Introduction

Several tests, including test_clifford_circuit_2 in Cirq/cirq-core/cirq/sim/clifford/clifford_simulator_test.py and test_example_runs_bb84 in Cirq/examples/examples_test.py, seem to be flaky when all seed-setting code (e.g. np.random.seed(0) or tf.random.set_seed(0)) is commented out.

For instance, in commit 8cef3d9dc16b27e3b10184e1b72afa764efe590d (version 0.11.0), test_clifford_circuit_2[qubits0] and test_clifford_circuit_2[qubits1] fail ~24% and ~30% of the time (each out of 500 runs) compared to 0% of the time (each out of 500 runs) when no seed-setting code is removed. Similarly, test_example_runs_bb84 fails ~32% of the time (out of 500 runs) compared to 0% of the time (out of 500 runs) when no seed-setting code is removed.

test_clifford_circuit_2 tests the Clifford circuit simulator while test_example_runs_bb84 tests the example implementation of the BB84 QKD protocol.

Motivation

Some tests can be flaky with high failure rates, but are not discovered when the seeds are set. We are trying to stabilize such tests.

Environment

The tests were run using pytest 6.2.2 in a conda environment with Python 3.6.13. The OS used was Ubuntu 16.04.

Discussion

The reason for the flakiness appears to be the use of seed-setting code when selecting circuit operations. For example, test_clifford_circuit_2 checks the value of sum(result.measurements['0'])[0] at the end of the test (and ensures it is between 20 and 80). However, the value of sum(result.measurements['0'])[0] is always 49 when the seed-setting code is not removed since the circuit being simulated remains the same from run to run. (On the other hand, when the seed-setting code is removed, the circuit being simulated does not remain the same and, hence, the value of sum(result.measurements['0'])[0] is not always between 20 and 80.) test_example_runs_bb84 is flaky when seed-setting code is removed for a similar reason.

We would be interested in learning if setting the seed enables the selection of a restricted class of circuit operations. We would also be interested in learning if there are any ways of addressing the seed-setting code in the test. We will be happy to raise a Pull Request to fix the tests and incorporate any feedback that you may have.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
95-martin-orioncommented, Jul 21, 2021

Would saving the current circuit in a test configuration file be a good way of resolving the flakiness?

This seems reasonable, although we should first confirm that the current circuit generates a Bell state (i.e. an equal superposition of |0) and |1)).

Additionally, it does seem that the assertion should be adjusted to check if sum(result.measurements['0'])[0] has a value of 49, rather than checking if the value is between 20 and 80.

I’d prefer to keep the looser assertion on this, even though specifying a seed will enforce a specific result. The reason for this is that we want the test to capture our expectations: if the circuit produces a Bell state, measuring from it has a 50% chance of producing a zero - so our expectation is that the number of zeros measured is e.g. in the range 40 < x < 60.

@melonwater211, would you like to take on fixing this issue?

0reactions
MichaelBroughtoncommented, Mar 28, 2022

I think we can close now since our tests now have a more fixed random seed and the function now uses a fixed random generator for that test.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Flaky tests - GitLab Docs
What's a flaky test? It's a test that sometimes fails, but if you retry it enough times, it passes, eventually. What are the...
Read more >
How to Fix Flaky Tests - Semaphore CI
Randomly failing tests are the hardest to debug. Here's a framework you can use to fix them and keep your test suite healthy....
Read more >
Strategies to Handle Flaky Automated Tests - YouTube
We define a " flaky " test as a test that both passes and fails with the same code. Test flakiness has many...
Read more >
Probabilistic and Systematic Coverage of Consecutive Test ...
Tests that can nondeterministically pass or fail when run on the same code version are called flaky tests. These tests are a major...
Read more >
How to Deal with Flaky Tests - The New Stack
1. Visualizing Test Runs · 2. Quarantining Flaky Tests · 3. Cleaning up State · 4. Looking for Timeouts · 5. Using Test...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found