Flaky tests and circuit operation selection
See original GitHub issueIntroduction
Several tests, including test_clifford_circuit_2
in Cirq/cirq-core/cirq/sim/clifford/clifford_simulator_test.py
and test_example_runs_bb84
in Cirq/examples/examples_test.py
, seem to be flaky when all seed-setting code (e.g. np.random.seed(0)
or tf.random.set_seed(0)
) is commented out.
For instance, in commit 8cef3d9dc16b27e3b10184e1b72afa764efe590d (version 0.11.0), test_clifford_circuit_2[qubits0]
and test_clifford_circuit_2[qubits1]
fail ~24% and ~30% of the time (each out of 500 runs) compared to 0% of the time (each out of 500 runs) when no seed-setting code is removed. Similarly, test_example_runs_bb84
fails ~32% of the time (out of 500 runs) compared to 0% of the time (out of 500 runs) when no seed-setting code is removed.
test_clifford_circuit_2
tests the Clifford circuit simulator while test_example_runs_bb84
tests the example implementation of the BB84 QKD protocol.
Motivation
Some tests can be flaky with high failure rates, but are not discovered when the seeds are set. We are trying to stabilize such tests.
Environment
The tests were run using pytest 6.2.2
in a conda
environment with Python 3.6.13
. The OS used was Ubuntu 16.04
.
Discussion
The reason for the flakiness appears to be the use of seed-setting code when selecting circuit operations. For example, test_clifford_circuit_2
checks the value of sum(result.measurements['0'])[0]
at the end of the test (and ensures it is between 20 and 80). However, the value of sum(result.measurements['0'])[0]
is always 49 when the seed-setting code is not removed since the circuit being simulated remains the same from run to run. (On the other hand, when the seed-setting code is removed, the circuit being simulated does not remain the same and, hence, the value of sum(result.measurements['0'])[0]
is not always between 20 and 80.) test_example_runs_bb84
is flaky when seed-setting code is removed for a similar reason.
We would be interested in learning if setting the seed enables the selection of a restricted class of circuit operations. We would also be interested in learning if there are any ways of addressing the seed-setting code in the test. We will be happy to raise a Pull Request to fix the tests and incorporate any feedback that you may have.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (3 by maintainers)
This seems reasonable, although we should first confirm that the current circuit generates a Bell state (i.e. an equal superposition of |0) and |1)).
I’d prefer to keep the looser assertion on this, even though specifying a seed will enforce a specific result. The reason for this is that we want the test to capture our expectations: if the circuit produces a Bell state, measuring from it has a 50% chance of producing a zero - so our expectation is that the number of zeros measured is e.g. in the range 40 < x < 60.
@melonwater211, would you like to take on fixing this issue?
I think we can close now since our tests now have a more fixed random seed and the function now uses a fixed random generator for that test.