.Net (DotNet) tests get stuck with no reported errors or warning and I could not find the cause on Gitlab Runner's Kubernetes executor
See original GitHub issueDescribe the bug
We have a .Net repository whose pipelines run 3 stages build
, test
, and deployment
.
1 one of the test jobs always gets stuck at a random point without reporting any error or warning that justifies this behavior. Note that this is running on Gitlab Runner’s Kubernetes executor, and this same job, does not fail when using Gitlab Runner’s Docker executor.
We’re using .Net 6, and the latest Docker image of .Net SDK
mcr.microsoft.com/dotnet/sdk:6.0
A sample output of the logs where the job gets stuck usually is
Test run for /builds/vistaprint-org/channel-technology/real-time-offer-service/api/src/MerchandisingFeeds.AcceptanceTests/bin/Release/net6.0/MerchandisingFeeds.AcceptanceTests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.3.0 (x64)
Copyright (c) Microsoft Corporation. All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
I also enabled the detailed verbosity of the logs and no error or warning are reported.
I tried many workarounds and verifications which some are proposed in similar or alike reported issues and these workarounds are:
- Run test jobs sequentially.
- Change Dockerfiles’ ENTRYPOINT from
dotnet
tobash
. - Run the job on different instances with different computing capacities.
- Verified that we have enough resources and disk space.
I also reviewed the logs of the Kubernetes pod and the corresponding container where the jobs’ service is running and no errors/warnings or anything related are reported.
Some say that with .Net 5 this would be solved but in my case, it’d be really difficult to run the project with .Net 5 as this implies a lot of code changes.
similar failure cases are already reported and no clear solution or cause is reported. Related issues:
- https://forum.gitlab.com/t/dotnet-test-hangs-in-gitlab-gitlab-runner-13-9-0-rc2/50977
- https://github.com/dotnet/sdk/issues/9452
- https://forum.gitlab.com/t/dotnet-test-hangs-in-gitlab-gitlab-runner-13-9-0-rc2/50977/3
Further technical details
- Include the output of
dotnet --info
$ dotnet --info
.NET SDK (reflecting any global.json):
Version: 6.0.400
Commit: 7771abd614
Runtime Environment:
OS Name: debian
OS Version: 11
OS Platform: Linux
RID: debian.11-x64
Base Path: /usr/share/dotnet/sdk/6.0.400/
global.json file:
Not found
Host:
Version: 6.0.8
Architecture: x64
Commit: 55fb7ef977
.NET SDKs installed:
6.0.400 [/usr/share/dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 6.0.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 6.0.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Download .NET:
https://aka.ms/dotnet-download
Learn about .NET Runtimes and SDKs:
https://aka.ms/dotnet/runtimes-sdk-info
Issue Analytics
- State:
- Created a year ago
- Reactions:5
- Comments:8 (1 by maintainers)
Top GitHub Comments
“solution”
Okay this “solution” works for me ➡️ https://github.com/dotnet/sdk/issues/9452#issuecomment-803033183
Hi, I think this issue is due to a bug in the
Process
type, I opened an issue about it here: https://github.com/dotnet/runtime/issues/51277In summary:
Here I guess
--blame-hang-timeout 15min
doesn’t do anything because it will try to kill a process that is already dead, and something is still stuck onWaitForExit()