[Question] Visual testing in docker on different CPU architecture
See original GitHub issueHey team 👋🏼
I am working to migrate my puppeteer visual regression testing to playwright.
My team has people working on Macs using with either arm64
(M1 SoC) or amd64
(Intel) CPU architecture.
I’d like a way to run and update playwright tests/screenshots locally from either architecture and have the local screenshots match the screenshots running from the CI (linux/amd64
).
Currently we use the mcr.microsoft.com/playwright:vx.x.x-focal
docker image to run tests both locally and in the CI. However running on these different architectures produce screenshots that are ever so slightly different when run on a different architecture, virtually imperceptible differences.
Screenshot from M1 mac - arm64
Screenshot from Intel mac - amd64
Diff screenshot
Diff gif
So my questions is, does anyone have a good strategy to avoid the above errors on these two architectures without reducing the threshold
?
I’ve tried running docker with --platform=linux/amd64
on my M1 mac, but I run into https://github.com/microsoft/playwright/issues/13724#issuecomment-1112358113 when running the tests, even on the latest docker version (v20.10.8
) with Rosetta 2 installed. Sounds like this could just be a known issue with docker.
Issue Analytics
- State:
- Created a year ago
- Reactions:4
- Comments:37 (23 by maintainers)
@nickofthyme Thank you again for your repro. I was able to get 247 pairs of actual/expected screenshots that all fail for me due to anti-aliasing. Hopefully we’ll come up with something to fix this.
Everybody: if you have examples of PNG screenshots that are taken on the same browser and same OS yet are different due to anti-aliasing artifacts, could you please attach the “expected”, “actual” and “diff” images here?
This information will help with our experiments with fighting browser rendering non-determinism.
@gselsidi thank you for checking!
The new comparator is designed to handle browser rendering non-determinism, so as long as the page layout is exactly the same, the screenshots should pass now.
Do note though that for the layout to be the same, font metrics must be the same as well. Otherwise, line wrapping might happen in different places, boxes will have different sizes, and the images will in fact be different even for human eye.
According to some exploration I did back in the days, headless vs headed rendering differences consist of the following nuances:
We didn’t aim to fix this though and we didn’t experiment yet.
This won’t work since browsers inside and outside docker will use different font stacks, resulting in different font metrics, resulting in different layout, and finally, yielding visually different screenshots.