question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in numbering during retries encountered

See original GitHub issue

For some reason this strange output can occur:

[       OK ] (231/279) FlexAlltoallTest on arolla:pn using PrgEnv-gnu [compile: 0.559s run: 21.334s total: 2072.256s]
[       OK ] (232/279) HaloCellExchangeTest on arolla:cn using PrgEnv-pgi [compile: 0.627s run: 21.459s total: 2083.825s]
[       OK ] (233/279) FlexAlltoallTest on arolla:pn using PrgEnv-pgi [compile: 0.777s run: 21.169s total: 2094.441s]
[       OK ] (234/279) HaloCellExchangeTest on arolla:cn using PrgEnv-gnu [compile: 0.435s run: 21.199s total: 2105.736s]
[       OK ] (235/279) DGEMMTest on arolla:pn using PrgEnv-gnu-nompi [compile: 0.451s run: 22.423s total: 2120.285s]
[       OK ] (236/279) KernelLatencyTest_sync on arolla:cn using PrgEnv-pgi [compile: 1.689s run: 15.531s total: 2125.056s]
[       OK ] (237/279) DGEMMTest on arolla:cn using PrgEnv-gnu-nompi [compile: 0.378s run: 10.441s total: 2136.125s]
[       OK ] (238/279) AllocSpeedTest_no on arolla:cn using PrgEnv-gnu [compile: 0.631s run: 10.444s total: 2147.662s]
[       OK ] (239/279) OpenaccCudaCpp on arolla:cn using PrgEnv-pgi [compile: 3.897s run: 5.305s total: 2157.925s]
[       OK ] (240/279) MultiDeviceOpenaccTest on arolla:cn using PrgEnv-pgi [compile: 1.740s run: 5.326s total: 2165.140s]
[       OK ] (241/279) GpuDirectCudaCheck on arolla:cn using PrgEnv-pgi [compile: 2.503s run: 5.313s total: 2173.099s]
[       OK ] (242/279) GpuDirectCudaCheck on arolla:cn using PrgEnv-gnu [compile: 1.761s run: 5.357s total: 2180.361s]
[       OK ] (243/279) GpuDirectAccCheck on arolla:cn using PrgEnv-pgi [compile: 0.542s run: 5.315s total: 2186.365s]
[       OK ] (244/279) CudaStressTest on arolla:cn using PrgEnv-pgi-nompi [compile: 1.473s run: 5.312s total: 2193.307s]
[       OK ] (245/279) CudaStressTest on arolla:cn using PrgEnv-gnu-nompi [compile: 1.604s run: 5.387s total: 2202.242s]
[       OK ] (246/279) CudaStressTest on arolla:cn using PrgEnv-gnu [compile: 1.621s run: 5.399s total: 2209.414s]
[     FAIL ] (247/279) HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 15.930s run: 5.384s total: 2230.874s]
[     FAIL ] (248/279) HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 13.805s run: 5.345s total: 2250.166s]
[     FAIL ] (249/279) HaloExchangeTest_default on arolla:cn using PrgEnv-gnu [compile: 13.870s run: 5.343s total: 2269.519s]
[     FAIL ] (250/279) AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 14.785s run: 5.350s total: 2289.798s]
[     FAIL ] (251/279) AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 14.555s run: 5.351s total: 2309.903s]
[     FAIL ] (252/279) AlltoallvTest_default on arolla:cn using PrgEnv-gnu [compile: 14.781s run: 5.345s total: 2330.184s]
[     FAIL ] (253/279) AutomaticArraysCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 0.337s run: n/a total: 2330.644s]
[     FAIL ] (254/279) AutomaticArraysCheck on arolla:cn using PrgEnv-gnu [compile: 0.000s run: n/a total: 2333.403s]
[----------] all spawned checks have finished

[==========] Retrying 7 failed check(s) (retry 1/2)
[----------] started processing AutomaticArraysCheck (AutomaticArraysCheck)
[ RUN      ] AutomaticArraysCheck on arolla:cn using PrgEnv-gnu
[     FAIL ] (1/9) AutomaticArraysCheck on arolla:cn using PrgEnv-gnu [compile: 0.000s run: n/a total: 0.018s]
[ RUN      ] AutomaticArraysCheck on arolla:cn using PrgEnv-gnu-nompi
[     FAIL ] (2/9) AutomaticArraysCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 0.000s run: n/a total: 0.015s]
[ RUN      ] AutomaticArraysCheck on arolla:cn using PrgEnv-pgi-nompi
[     FAIL ] (3/9) AutomaticArraysCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 0.338s run: n/a total: 0.355s]
[----------] finished processing AutomaticArraysCheck (AutomaticArraysCheck)

[----------] started processing AlltoallvTest_default (AlltoallvTest_default)
[ RUN      ] AlltoallvTest_default on arolla:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_default (AlltoallvTest_default)

[----------] started processing AlltoallvTest_nocomm (AlltoallvTest_nocomm)
[ RUN      ] AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu
[     HOLD ] AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_nocomm (AlltoallvTest_nocomm)

[----------] started processing AlltoallvTest_nocomp (AlltoallvTest_nocomp)
[ RUN      ] AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu
[     HOLD ] AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_nocomp (AlltoallvTest_nocomp)

[----------] started processing HaloExchangeTest_default (HaloExchangeTest_default)
[ RUN      ] HaloExchangeTest_default on arolla:cn using PrgEnv-gnu
[     HOLD ] HaloExchangeTest_default on arolla:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_default (HaloExchangeTest_default)

[----------] started processing HaloExchangeTest_nocomm (HaloExchangeTest_nocomm)
[ RUN      ] HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu
[     HOLD ] HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_nocomm (HaloExchangeTest_nocomm)

[----------] started processing HaloExchangeTest_nocomp (HaloExchangeTest_nocomp)
[ RUN      ] HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu
[     HOLD ] HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_nocomp (HaloExchangeTest_nocomp)

[----------] waiting for spawned checks to finish
[     FAIL ] (4/9) AlltoallvTest_default on arolla:cn using PrgEnv-gnu [compile: 14.916s run: 3.517s total: 18.465s]
[     FAIL ] (5/9) HaloExchangeTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 13.807s run: 3.376s total: 20.070s]
[     FAIL ] (6/9) HaloExchangeTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 13.856s run: 5.858s total: 39.930s]
[     FAIL ] (7/9) HaloExchangeTest_default on arolla:cn using PrgEnv-gnu [compile: 14.041s run: 4.898s total: 59.020s]
[     FAIL ] (8/9) AlltoallvTest_nocomp on arolla:cn using PrgEnv-gnu [compile: 14.736s run: 5.293s total: 79.192s]
[     FAIL ] (9/9) AlltoallvTest_nocomm on arolla:cn using PrgEnv-gnu [compile: 14.641s run: 5.331s total: 99.302s]
[       OK ] (10/9) NetCDFTest_f90_static on arolla:cn using PrgEnv-pgi-nompi [compile: 1.586s run: 5.413s total: 2455.413s]
[       OK ] (11/9) NetCDFTest_f90_static on arolla:cn using PrgEnv-gnu-nompi [compile: 0.841s run: 5.514s total: 2461.916s]
[       OK ] (12/9) NetCDFTest_f90_dynamic on arolla:cn using PrgEnv-pgi-nompi [compile: 0.727s run: 5.461s total: 2468.249s]
[       OK ] (13/9) NetCDFTest_f90_dynamic on arolla:cn using PrgEnv-gnu-nompi [compile: 0.586s run: 5.495s total: 2474.479s]
[       OK ] (14/9) NetCDFTest_c_static on arolla:cn using PrgEnv-pgi-nompi [compile: 0.669s run: 5.464s total: 2480.763s]
[       OK ] (15/9) NetCDFTest_c_static on arolla:cn using PrgEnv-gnu-nompi [compile: 0.621s run: 5.471s total: 2486.995s]
[       OK ] (16/9) NetCDFTest_c_dynamic on arolla:cn using PrgEnv-pgi-nompi [compile: 0.629s run: 5.462s total: 2493.234s]
[       OK ] (17/9) NetCDFTest_c_dynamic on arolla:cn using PrgEnv-gnu-nompi [compile: 0.579s run: 5.496s total: 2499.461s]
[       OK ] (18/9) NetCDFTest_cpp_static on arolla:cn using PrgEnv-pgi-nompi [compile: 1.403s run: 5.542s total: 2506.552s]
[       OK ] (19/9) NetCDFTest_cpp_static on arolla:cn using PrgEnv-gnu-nompi [compile: 1.092s run: 5.543s total: 2513.355s]
[       OK ] (20/9) NetCDFTest_cpp_dynamic on arolla:cn using PrgEnv-pgi-nompi [compile: 1.292s run: 5.560s total: 2520.358s]
[       OK ] (21/9) NetCDFTest_cpp_dynamic on arolla:cn using PrgEnv-gnu-nompi [compile: 0.926s run: 5.562s total: 2526.992s]
[       OK ] (22/9) GpuBandwidthCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 2.199s run: 92.328s total: 2621.801s]
[       OK ] (23/9) CudaSimpleMPICheck on arolla:cn using PrgEnv-gnu [compile: 2.000s run: 5.400s total: 2629.342s]
[       OK ] (24/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 1.732s run: 5.306s total: 2636.522s]
[       OK ] (25/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-pgi [compile: 1.715s run: 5.302s total: 2643.677s]
[       OK ] (26/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 1.865s run: 5.389s total: 2651.071s]
[       OK ] (27/9) CudaConcurrentKernelsCheck on arolla:cn using PrgEnv-gnu [compile: 1.862s run: 5.385s total: 2658.462s]
[       OK ] (28/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 1.680s run: 5.309s total: 2665.592s]
[       OK ] (29/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-pgi [compile: 1.660s run: 5.309s total: 2672.714s]
[       OK ] (30/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 1.841s run: 5.391s total: 2680.085s]
[       OK ] (31/9) CudaDeviceQueryCheck on arolla:cn using PrgEnv-gnu [compile: 1.798s run: 5.388s total: 2687.407s]
[       OK ] (32/9) CudaMatrixmulCublasCheck on arolla:cn using PrgEnv-pgi-nompi [compile: 1.799s run: 5.308s total: 2694.655s]
[       OK ] (33/9) CudaMatrixmulCublasCheck on arolla:cn using PrgEnv-pgi [compile: 1.841s run: 5.300s total: 2701.934s]
[       OK ] (34/9) CudaMatrixmulCublasCheck on arolla:cn using PrgEnv-gnu-nompi [compile: 1.933s run: 5.437s total: 2709.481s]
[----------] all spawned checks have finished

It seems that the first run ended prematurely and the 34 remaining cases were considered for the next retry. I don’t know the root cause of this, but it’s certainly a bug.

More details here: https://jenkins.cscs.ch/blue/organizations/jenkins/reframe-arolla-validation/detail/reframe-arolla-validation/4/pipeline/

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
vkarakcommented, Jul 16, 2020

@ekouts I haven’t seen that happening anywhere else. You could potentially construct a series of test to trigger this artificially.

0reactions
ekoutscommented, Jul 21, 2020

If this is happening, it’s because we don’t clean up the lists after we finish, right? Shouldn’t we clean them or assert that they are indeed empty?

I agree with you, I think we should assert that _completed_tasks and _ready_tasks are empty, although I didn’t find how they won’t be cleaned. The _retired_tasks could be non empty because of failing dependencies and they might be cleaned up in a later retry or not at all.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] Show the retry number in the list/line reporter ... - GitHub
Issue 2: Failure logs are not shown in the console, when the tests are getting failing at the moment instead failure details are...
Read more >
255091 – freebsd-update should retry when encountering ...
Upgrading to 13.0-RELEASE has been a little slow because we often hit errors where one of the files that was phttpget'd was corrupt....
Read more >
Bug #1714632 “Exceeded maximum number of retries ...
I created an instance from image on the openstack Ocata deployed with ceph. Getting the instance error below. Exceeded maximum number of ...
Read more >
Retry Scope Condition Potential Bug - UiPath Community Forum
I have a check app state inside the condition sector of the retry activity. If the element is not found within 5 seconds,...
Read more >
The Fast Approach to Workarounds: When the Bug Isn't Yours ...
After AWS rejects a set threshold* number of requests, services may be pulled into the AWS Tarpit, which further degrades the success rate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found