Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unit tests are >= 30 mins, too long to run on every commit

See original GitHub issue

Now that we increased the sampling in the operator estimation tests again, the unit tests take at least 30 mins (I’m seeing 40+ right now as I type this). That is way too long to run on every commit. First things that come to mind for potential solutions are:

Run short operator estimation tests on commits, and run the longer tests nightly
Only run operator estimation tests when there are changes to operator_estimation.py, and otherwise do not (might be hairy, may need to also trigger off all of the file’s dependencies to be 100% safe, and this could blow up quickly – need to check)
Run the operator estimation tests separately, so they get their own CI runners (this doesnt completely solve the problem, but you’d get to see a green check mark for all the other files in ~5 mins which may unblock people even if the operator estimation tests are still running)

Issue Analytics

State:
Created 4 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

applebycommented, Sep 3, 2019

Many of the slowest operator estimation tests have a similar structure: build some TomographyExpirement, call measure_observables on it in a loop 100 times, then check at the end to see if the mean result is within some absolute tolerance of the expected value.

Maybe this invalidates the tests idk, but one way to speed it up would be to instead run the loop in batches of say 20 iterations, then check after each batch to see if the value has converged to within the tolerance, rather than always waiting the full 100 iterations. Obviously this doesn’t help the worst-case, but would probably speed up the average case, assuming that stopping early doesn’t invalidate the test. Perhaps @msohaibalam can comment on whether this is a valid approach.

For example, I modified the loop in test_2q_unitary_channel_fidelity_readout_error to collect intermediate results and plotted the estimated fidelity against the number of loop iterations. Here are results for test runs of 100 and 200 iterations:

pyquil_966_est_fidel_100_iter

pyquil_966_est_fidel_200_iter

In the case of test_2q_unitary_channel_fidelity_readout_error, the test compares against the expected fidelity with an absolute tolerance of 2e-2 (plus a smaller relative-tolerance factor). Unless I’m misunderstanding (or my plots are wrong lol) it looks like the tolerance is sufficiently loose that the fidelity estimate is within the bounds right from the off (for these two runs anyway). Which is to say that both of the above two runs would have passed the tolerance test after 20 iterations.

Here is another run of 100 iterations against a different random quil program with different expected_fidelity. This one converges more slowly, but is still always within the 0.02 tolerance range.

pyquil_966_est_fidel_prog2_100_iter

0reactions

applebycommented, Sep 5, 2019

Having now sat through multiple full-length runs of the test suite, I would no longer stand in the way of common sense and/or progress if we switched the default to --use-seed. I probably should have just listened to @msohaibalam from the start!

To be fair, the original PR discussion talked about reducing the operator est. test time from ~5 min -> ~30s, and I didn’t realize at the time that the number of loop iterations for the slow version of the tests was simultaneously increasing by 4x, resulting in the run time of the slow version increasing from ~5 min -> ~20 min. It was all there in front of me in the diff of course, but I didn’t notice it.