Bug in numbering during retries encountered
See original GitHub issueFor some reason this strange output can occur:
[ [32m OK[0m ] (231/279) FlexAlltoallTest on arolla:pn using PrgEnv-gnu [compile: 0.559s run: 21.334s total: 2072.256s]
[ [32m OK[0m ] (232/279) HaloCellExchangeTest on arolla:cn using PrgEnv-pgi [compile: 0.627s run: 21.459s total: 2083.825s]
[ [32m OK[0m ] (233/279) FlexAlltoallTest on arolla:pn using PrgEnv-pgi [compile: 0.777s run: 21.169s total: 2094.441s]
[ [32m OK[0m ] (234/279) HaloCellExchangeTest on arolla:cn using PrgEnv-gnu [compile: 0.435s run: 21.199s total: 2105.736s]
[ [32m OK[0m ] (235/279) DGEMMTest on arolla:pn using PrgEnv-gnu-nompi [compile: 0.451s run: 22.423s total: 2120.285s]
[ [32m OK[0m ] (236/279) KernelLatencyTest_sync on arolla:cn using PrgEnv-pgi [compile: 1.689s run: 15.531s total: 2125.056s]
[ [32m OK[0m ] (237/279) DGEMMTest on arolla:cn using PrgEnv-gnu-nompi [compile: 0.378s run: 10.441s total: 2136.125s]
[ [32m OK[0m ] (238/279) AllocSpeedTest_no on arolla:cn using PrgEnv-gnu [compile: 0.631s run: 10.444s total: 2147.662s]
[ [32m OK[0m ] (239/279) OpenaccCudaCpp on arolla:cn using PrgEnv-pgi [compile: 3.897s run: 5.305s total: 2157.925s]
[ [32m OK[0m ] (240/279) MultiDeviceOpenaccTest on arolla:cn using PrgEnv-pgi [compile: 1.740s run: 5.326s total: 2165.140s]
[ [32m OK[0m ] (241/279) GpuDirectCudaCheck on arolla:cn using PrgEnv-pgi [compile: 2.503s run: 5.313s total: 2173.099s]
[ [32m OK[0m ] (242/279) GpuDirectCudaCheck on arolla:cn using PrgEnv-gnu [compile: 1.761s run: 5.357s total: 2180.361s]
[ [32m OK[0m ] (243/279) GpuDirectAccCheck on arolla:cn using PrgEnv-pgi [compile: 0.542s run: 5.315s total: 2186.365s]
[ [32m OK[0m ] (244/279) CudaStressTest on arolla:cn using PrgEnv-pgi-nompi [compile: 1.473s run: 5.312s total: 2193.307s]
[ [32m OK[0m ] (245/279) CudaStressTest on arolla:cn using PrgEnv-gnu-nompi [compile: 1.604s run: 5.387s total: 2202.242s]
[ [32m OK[0m ] (246/279) CudaStressTest on arolla:cn using PrgEnv-gnu [compile: 1.621s run: 5.399s total: 2209.414s]
[ [31m FAIL[0m ] (247/279) HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 15.930s run: 5.384s total: 2230.874s]
[ [31m FAIL[0m ] (248/279) HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 13.805s run: 5.345s total: 2250.166s]
[ [31m FAIL[0m ] (249/279) HaloExchangeTest_default on arolla:cn using PrgEnv-gnu [compile: 13.870s run: 5.343s total: 2269.519s]
[ [31m FAIL[0m ] (250/279) AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 14.785s run: 5.350s total: 2289.798s]
[ [31m FAIL[0m ] (251/279) AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 14.555s run: 5.351s total: 2309.903s]
[ [31m FAIL[0m ] (252/279) AlltoallvTest_default on arolla:cn using PrgEnv-gnu [compile: 14.781s run: 5.345s total: 2330.184s]
[ [31m FAIL[0m ] (253/279) AutomaticArraysCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 0.337s run: n/a total: 2330.644s]
[ [31m FAIL[0m ] (254/279) AutomaticArraysCheck on arolla:cn using PrgEnv-gnu [compile: 0.000s run: n/a total: 2333.403s]
[----------] all spawned checks have finished
[==========] Retrying 7 failed check(s) (retry 1/2)
[----------] started processing AutomaticArraysCheck (AutomaticArraysCheck)
[ [32mRUN [0m ] AutomaticArraysCheck on arolla:cn using PrgEnv-gnu
[ [31m FAIL[0m ] (1/9) AutomaticArraysCheck on arolla:cn using PrgEnv-gnu [compile: 0.000s run: n/a total: 0.018s]
[ [32mRUN [0m ] AutomaticArraysCheck on arolla:cn using PrgEnv-gnu-nompi
[ [31m FAIL[0m ] (2/9) AutomaticArraysCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 0.000s run: n/a total: 0.015s]
[ [32mRUN [0m ] AutomaticArraysCheck on arolla:cn using PrgEnv-pgi-nompi
[ [31m FAIL[0m ] (3/9) AutomaticArraysCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 0.338s run: n/a total: 0.355s]
[----------] finished processing AutomaticArraysCheck (AutomaticArraysCheck)
[----------] started processing AlltoallvTest_default (AlltoallvTest_default)
[ [32mRUN [0m ] AlltoallvTest_default on arolla:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_default (AlltoallvTest_default)
[----------] started processing AlltoallvTest_nocomm (AlltoallvTest_nocomm)
[ [32mRUN [0m ] AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu
[ [32m HOLD[0m ] AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_nocomm (AlltoallvTest_nocomm)
[----------] started processing AlltoallvTest_nocomp (AlltoallvTest_nocomp)
[ [32mRUN [0m ] AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu
[ [32m HOLD[0m ] AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_nocomp (AlltoallvTest_nocomp)
[----------] started processing HaloExchangeTest_default (HaloExchangeTest_default)
[ [32mRUN [0m ] HaloExchangeTest_default on arolla:cn using PrgEnv-gnu
[ [32m HOLD[0m ] HaloExchangeTest_default on arolla:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_default (HaloExchangeTest_default)
[----------] started processing HaloExchangeTest_nocomm (HaloExchangeTest_nocomm)
[ [32mRUN [0m ] HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu
[ [32m HOLD[0m ] HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_nocomm (HaloExchangeTest_nocomm)
[----------] started processing HaloExchangeTest_nocomp (HaloExchangeTest_nocomp)
[ [32mRUN [0m ] HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu
[ [32m HOLD[0m ] HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_nocomp (HaloExchangeTest_nocomp)
[----------] waiting for spawned checks to finish
[ [31m FAIL[0m ] (4/9) AlltoallvTest_default on arolla:cn using PrgEnv-gnu [compile: 14.916s run: 3.517s total: 18.465s]
[ [31m FAIL[0m ] (5/9) HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 13.807s run: 3.376s total: 20.070s]
[ [31m FAIL[0m ] (6/9) HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 13.856s run: 5.858s total: 39.930s]
[ [31m FAIL[0m ] (7/9) HaloExchangeTest_default on arolla:cn using PrgEnv-gnu [compile: 14.041s run: 4.898s total: 59.020s]
[ [31m FAIL[0m ] (8/9) AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 14.736s run: 5.293s total: 79.192s]
[ [31m FAIL[0m ] (9/9) AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 14.641s run: 5.331s total: 99.302s]
[ [32m OK[0m ] (10/9) NetCDFTest_f90_static on arolla:cn using PrgEnv-pgi-nompi [compile: 1.586s run: 5.413s total: 2455.413s]
[ [32m OK[0m ] (11/9) NetCDFTest_f90_static on arolla:cn using PrgEnv-gnu-nompi [compile: 0.841s run: 5.514s total: 2461.916s]
[ [32m OK[0m ] (12/9) NetCDFTest_f90_dynamic on arolla:cn using PrgEnv-pgi-nompi [compile: 0.727s run: 5.461s total: 2468.249s]
[ [32m OK[0m ] (13/9) NetCDFTest_f90_dynamic on arolla:cn using PrgEnv-gnu-nompi [compile: 0.586s run: 5.495s total: 2474.479s]
[ [32m OK[0m ] (14/9) NetCDFTest_c_static on arolla:cn using PrgEnv-pgi-nompi [compile: 0.669s run: 5.464s total: 2480.763s]
[ [32m OK[0m ] (15/9) NetCDFTest_c_static on arolla:cn using PrgEnv-gnu-nompi [compile: 0.621s run: 5.471s total: 2486.995s]
[ [32m OK[0m ] (16/9) NetCDFTest_c_dynamic on arolla:cn using PrgEnv-pgi-nompi [compile: 0.629s run: 5.462s total: 2493.234s]
[ [32m OK[0m ] (17/9) NetCDFTest_c_dynamic on arolla:cn using PrgEnv-gnu-nompi [compile: 0.579s run: 5.496s total: 2499.461s]
[ [32m OK[0m ] (18/9) NetCDFTest_cpp_static on arolla:cn using PrgEnv-pgi-nompi [compile: 1.403s run: 5.542s total: 2506.552s]
[ [32m OK[0m ] (19/9) NetCDFTest_cpp_static on arolla:cn using PrgEnv-gnu-nompi [compile: 1.092s run: 5.543s total: 2513.355s]
[ [32m OK[0m ] (20/9) NetCDFTest_cpp_dynamic on arolla:cn using PrgEnv-pgi-nompi [compile: 1.292s run: 5.560s total: 2520.358s]
[ [32m OK[0m ] (21/9) NetCDFTest_cpp_dynamic on arolla:cn using PrgEnv-gnu-nompi [compile: 0.926s run: 5.562s total: 2526.992s]
[ [32m OK[0m ] (22/9) GpuBandwidthCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 2.199s run: 92.328s total: 2621.801s]
[ [32m OK[0m ] (23/9) CudaSimpleMPICheck on arolla:cn using PrgEnv-gnu [compile: 2.000s run: 5.400s total: 2629.342s]
[ [32m OK[0m ] (24/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 1.732s run: 5.306s total: 2636.522s]
[ [32m OK[0m ] (25/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-pgi [compile: 1.715s run: 5.302s total: 2643.677s]
[ [32m OK[0m ] (26/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 1.865s run: 5.389s total: 2651.071s]
[ [32m OK[0m ] (27/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-gnu [compile: 1.862s run: 5.385s total: 2658.462s]
[ [32m OK[0m ] (28/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 1.680s run: 5.309s total: 2665.592s]
[ [32m OK[0m ] (29/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-pgi [compile: 1.660s run: 5.309s total: 2672.714s]
[ [32m OK[0m ] (30/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 1.841s run: 5.391s total: 2680.085s]
[ [32m OK[0m ] (31/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-gnu [compile: 1.798s run: 5.388s total: 2687.407s]
[ [32m OK[0m ] (32/9) CudaMatrixmulCublasCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 1.799s run: 5.308s total: 2694.655s]
[ [32m OK[0m ] (33/9) CudaMatrixmulCublasCheck on arolla:cn using PrgEnv-pgi [compile: 1.841s run: 5.300s total: 2701.934s]
[ [32m OK[0m ] (34/9) CudaMatrixmulCublasCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 1.933s run: 5.437s total: 2709.481s]
[----------] all spawned checks have finished
It seems that the first run ended prematurely and the 34 remaining cases were considered for the next retry. I don’t know the root cause of this, but it’s certainly a bug.
More details here: https://jenkins.cscs.ch/blue/organizations/jenkins/reframe-arolla-validation/detail/reframe-arolla-validation/4/pipeline/
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
[BUG] Show the retry number in the list/line reporter ... - GitHub
Issue 2: Failure logs are not shown in the console, when the tests are getting failing at the moment instead failure details are...
Read more >255091 – freebsd-update should retry when encountering ...
Upgrading to 13.0-RELEASE has been a little slow because we often hit errors where one of the files that was phttpget'd was corrupt....
Read more >Bug #1714632 “Exceeded maximum number of retries ...
I created an instance from image on the openstack Ocata deployed with ceph. Getting the instance error below. Exceeded maximum number of ...
Read more >Retry Scope Condition Potential Bug - UiPath Community Forum
I have a check app state inside the condition sector of the retry activity. If the element is not found within 5 seconds,...
Read more >The Fast Approach to Workarounds: When the Bug Isn't Yours ...
After AWS rejects a set threshold* number of requests, services may be pulled into the AWS Tarpit, which further degrades the success rate...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ekouts I haven’t seen that happening anywhere else. You could potentially construct a series of test to trigger this artificially.
I agree with you, I think we should assert that
_completed_tasks
and_ready_tasks
are empty, although I didn’t find how they won’t be cleaned. The_retired_tasks
could be non empty because of failing dependencies and they might be cleaned up in a later retry or not at all.