How to test visualization?
See original GitHub issueWe need to improve the way we test visualization, that’s out of discussion. However, it is not fully obvious how.
Compare images
The way we compare images is implemented in QiskitVisualizationTestCase.assertImagesAreEqual
(test/python/visualization/visualization.py
). As defined, it is very unstable. How images are generated very much depend on non-controlled factors, like the available fonts. At the same time, it seems that is not sensitive enough. Tolerating some difference (in order to handle uncontrolled factors) makes relevant differences hard to detect. If we are willing to reduce the tolerance to the point that semantic changes are visible, we need to consider the CI as the “ground truth”. For that, we need to save unmatching images and update the references (using PublishBuildArtifacts
like in here)
Mock the drawing libraries
For the latex drawer, comparing the latex source seems the way to go. For the matplotlib case, we should be able to mock matplotlib.figure.Figure
. But I dont know how complicated that can be.
Any other idea?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:13 (13 by maintainers)
^ ditto. We are not testing against any ground “truth”, just against some status quo. Catching actual visualization bugs have always come down to a user noticing something is off, and reporting it. So why not just remove these painful tests?
So my issue with image comparison tests, beyond their fickleness based on a lot of environmental factors, including but not limited to the mpl backend (which is hardcoded for testing in #2949), is that they’re not actually testing things are correct. Image comparison tests just test the status quo which may or may not be correct. We’re encoding the behavior of the current output in our reference images not actually what we view as a correct. We’ve had instances in the past where we’ve had a bug with barriers in the reference images and had no idea. Another perfect example is #3052 which if we had latex image comparison tests (or latex source comparison tests) would fail. Even though the output with #3052 is objectively more correct in pretty much every case the tests would fail. The tests do not tell us if we have a bug or not, they just indicate when we’ve changed something, which doesn’t seem like much of a value add. Especially when weighed against their general instability.