Fix flaky tests that are recently popping up
See original GitHub issueSince https://github.com/pytorch/vision/pull/4497 was merged, we’re observing a few tests that start randomly failing.
Before https://github.com/pytorch/vision/pull/4497, these tests were almost always using the same RNG state, which was set in a test that was run earlier in the test execution suite. Now that all tests are properly independent and that the RNG doesn’t leak, these tests run with a new RNG at each execution, and if they’re unstable they might fail.
(Note: this is a good thing; it’s better to know that they fail now rather than when submiting an unrelated PR, which is what happened in https://github.com/pytorch/vision/pull/3032#issuecomment-734829336)
For each of these tests we should find out whether the flakyness is severe or not. A simple solution is to parametrize the test over 100 or 1000 random seeds and check the failure rate. If the failure rate is reasonable we can just set a seed with toch.manual_seed()
. If not, we should try to fix the test and make it more robust.
The list of tests so far is:
- test_random_apply - https://github.com/pytorch/vision/pull/4756
- test_stochastic_depth[row-0.2] - test.test_ops.TestStochasticDepth - #4758
- test_randomperspective_fill[L] - https://github.com/pytorch/vision/pull/4759
- test_randomperspective_fill[RGB] - https://github.com/pytorch/vision/pull/4759
- test_randomperspective_fill[F] - https://github.com/pytorch/vision/pull/4759
- test_random_vertical_flip - https://github.com/pytorch/vision/pull/4756
- test_random_horizontal_flip - https://github.com/pytorch/vision/pull/4756
- test_frozenbatchnorm2d_eps #4761
- test_batched_nms_implementations #4766
- test_backward[True-cpu] - test.test_ops.TestRoiPool #4763
- test_backward[False-cpu] - test.test_ops.TestRoiPool #4763
- test_random_erasing - https://github.com/pytorch/vision/pull/4764
- test_color_jitter_hue[hue2-3-cpu] - #4762
- test_color_jitter_contrast[1.5-3-cuda] - #4762
cc @pmeier
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:11 (11 by maintainers)
Top GitHub Comments
It’s the unstable sort. See https://github.com/pytorch/vision/pull/4766#issuecomment-952996259
I think it’s worth understanding why the open-source contributor couldn’t make the sort stable (he was facing seg fault if I remember correctly). Fixing the sort will fix lots of instability on the Detection models, so definitely worth while.
It would be interesting to figure out whether the 6 failures correspond to a specific edge-case, but I wouldn’t spend too much time on it either.
It could just be some ties in the sorting (which is not a stable sort)?