test_nms_cuda is flaky
See original GitHub issue🐛 Bug
================================== FAILURES ===================================
___________________________ NMSTester.test_nms_cuda ___________________________
self = <test_ops.NMSTester testMethod=test_nms_cuda>
@unittest.skipIf(not torch.cuda.is_available(), "CUDA unavailable")
def test_nms_cuda(self):
err_msg = 'NMS incompatible between CPU and CUDA for IoU={}'
for iou in [0.2, 0.5, 0.8]:
boxes, scores = self._create_tensors_with_iou(1000, iou)
r_cpu = ops.nms(boxes, scores, iou)
r_cuda = ops.nms(boxes.cuda(), scores.cuda(), iou)
> self.assertTrue(torch.allclose(r_cpu, r_cuda.cpu()), err_msg.format(iou))
E RuntimeError: The size of tensor a (461) must match the size of tensor b (460) at non-singleton dimension 0
test\test_ops.py:403: RuntimeError
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
What is a flaky test? Definition from WhatIs.com. - TechTarget
A flaky test is an analysis of web application code that fails to produce the same result each time the same analysis is...
Read more >How to Fix Flaky Tests - Semaphore CI
Flaky tests hinder development, slow down progress, hide design problems, and can cost a lot of money in the long run.
Read more >What are Flaky Tests? | TeamCity CI/CD Guide - JetBrains
Flaky tests are tests that return new results, despite there being no changes to code. Find out why flaky tests matter and how...
Read more >Test Flakiness - Methods for identifying and dealing with flaky ...
A flaky test is a test that both passes and fails periodically without any code changes. Flaky tests are definitely annoying but they...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I looked at this some more…
If
devIoU()
is changed as discussed in https://github.com/pytorch/vision/pull/2044 (use a division while calculating IoU, similar to CPU calculation) then it’s possible compare the overlap values calculated by CPU vs CUDA.With that change, overlap values calculated by CPU vs CUDA usually agree exactly, but they can differ from one another by up to 4 ULP (very rarely). Across 1000 seeds:
Even when the calculated overlaps differ, the testcase will still pass unless the overlap values straddle the threshold (i.e. one overlap greater than the threshold, the other not).
If NVCC’s fused multiply add optimization is disabled (e.g. by adding
--fmad=false
toNVCC_FLAGS
) then the overlaps calculated by CPU and CUDA agree exactly, and the testcase does not fail (even if PR 2044’s change to_create_tensors_with_iou()
is reverted).Disabling
fmad
without changingdevIoU()
improves things (reduces failure rate to about 25%), but does not prevent the problem entirely.NVCC enables
fmad
by default. It presumably benefits performance.Disabling NVCC’s precise division optimization (
--prec-div
), did not affect the results at all.To summarize:
devIoU()
to use similar divisionfmad
optimizationfmad
may affect performanceSo maybe…
devIoU()
to use similar division as CPUfmad
to default to enabled_create_tensors_with_iou()
to force generated data a bit away from the thresholdIf that sounds OK, I can put up a PR for the
devIoU()
change.Thanks @hartb ! This has been fixed now