question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve thread shutdown management in ANCM

See original GitHub issue

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I want to revive the #43651 issue, as now we have been able to create a repro case that fails basically every time for several apps. The issue seems to happen when there is a big load of CPU and we have a lot of sites starting at the same time. Some of them will start successfully, but some of them will stay in a zombie-like state stuck initializing, and the processes are there but the service won’t be able to serve requests ever again and IIS does nothing to stop/reset them either.

So the problem arises for us specifically in servers that have a lot of sites installed on them (about 200 per box) and we have preload enabled for them. Some of them have a heavy initialization process, so every time we restart the server the box CPU utilization would max out for the initial 10 minutes or so. After the initialization is done the CPU comes down to normal levels.

But at this point, we would be left with a bunch of services that never came into life and are stuck in the load of aspnetcorev2_inprocess.dll as explained in the previous issue. We noticed there is a pattern in the memory consumption of the zombie processes, so it’s easy to spot them once you know. In the repro cases that I created, we have a “SlowStartWebApi” which represents the app that needs to do some initialization when starting (and maximizing the CPU). Once the apps are initialized the memory is supposed to look like these (around 8 MB): normal memory consumption

But the dead ones will look something similar to these (around 4 MB): dead memory consumption

Description From old issue #43651 as a reference

We’ve recently converted one of our web applications to ASP.NET Core on .NET 6 (from ASP.NET MVC/WebAPI on .NET Framework 4.8) and the new version is slowly rolling out to our customer base. This is hosted in IIS. We’ve had this running in production for several weeks now with no problems… until now.

This morning one of those sites appears to be hanging and was not responding to web requests. Upon further investigation by taking a process dump of w3wp.exe and examining it, it appears that the ASP.NET Core Module had hung during application initialization.

Extremely strangely, this occurred identically across two different servers, each serving the same site. This affected application is also the only ASP.NET Core application on each of those servers.

The stack trace of the only thread actually doing any kind of work is:

ntdll.dll!NtWaitForSingleObject�()
ntdll.dll!LdrpDrainWorkQueue()
ntdll.dll!LdrpLoadDllInternal()
ntdll.dll!LdrpLoadDll�()
ntdll.dll!LdrLoadDll()
KERNELBASE.dll!LoadLibraryExW()
aspnetcorev2.dll!HandlerResolver::LoadRequestHandlerAssembly(const IHttpApplication & pApplication, const std::filesystem::path & shadowCopyPath, const ShimOptions & pConfiguration, std::unique_ptr<ApplicationFactory,std::default_delete<ApplicationFactory>> & pApplicationFactory, ErrorContext & errorContext) Line 111
	at D:\a\_work\1\s\src\Servers\IIS\AspNetCoreModuleV2\AspNetCore\HandlerResolver.cpp(111)
aspnetcorev2.dll!HandlerResolver::GetApplicationFactory(const IHttpApplication & pApplication, const std::filesystem::path & shadowCopyPath, std::unique_ptr<ApplicationFactory,std::default_delete<ApplicationFactory>> & pApplicationFactory, const ShimOptions & options, ErrorContext & errorContext) Line 172
	at D:\a\_work\1\s\src\Servers\IIS\AspNetCoreModuleV2\AspNetCore\HandlerResolver.cpp(172)
aspnetcorev2.dll!APPLICATION_INFO::TryCreateApplication(IHttpContext & pHttpContext, const ShimOptions & options, ErrorContext & error) Line 195
	at D:\a\_work\1\s\src\Servers\IIS\AspNetCoreModuleV2\AspNetCore\applicationinfo.cpp(195)
aspnetcorev2.dll!APPLICATION_INFO::CreateApplication(IHttpContext & pHttpContext) Line 106
	at D:\a\_work\1\s\src\Servers\IIS\AspNetCoreModuleV2\AspNetCore\applicationinfo.cpp(106)
aspnetcorev2.dll!APPLICATION_INFO::CreateHandler(IHttpContext & pHttpContext, std::unique_ptr<IREQUEST_HANDLER,IREQUEST_HANDLER_DELETER> & pHandler) Line 63
	at D:\a\_work\1\s\src\Servers\IIS\AspNetCoreModuleV2\AspNetCore\applicationinfo.cpp(63)
aspnetcorev2.dll!ASPNET_CORE_PROXY_MODULE::OnExecuteRequestHandler(IHttpContext * pHttpContext, IHttpEventProvider * __formal) Line 103
	at D:\a\_work\1\s\src\Servers\IIS\AspNetCoreModuleV2\AspNetCore\proxymodule.cpp(103)
iiscore.dll!NOTIFICATION_CONTEXT::RequestDoWork()
iiscore.dll!NOTIFICATION_CONTEXT::CallModulesInternal()
iiscore.dll!NOTIFICATION_CONTEXT::CallModules(int,unsigned long,long,unsigned long,class W3_CONTEXT_BASE *,class IHttpEventProvider *)
iiscore.dll!NOTIFICATION_MAIN::DoWork()
iiscore.dll!W3_CONTEXT_BASE::StartNotificationLoop(class NOTIFICATION_CONTEXT *,int)
iiscore.dll!APPLICATION_PRELOAD_PROVIDER::ExecuteRequest(class IHttpContext *,class IHttpUser *)
warmup.dll!DoApplicationPreload(class IGlobalApplicationPreloadProvider *)
iiscore.dll!W3_SERVER::GlobalNotify()
iiscore.dll!W3_SERVER::NotifyApplicationPreload(int)
iiscore.dll!IISCORE_PROTOCOL_MANAGER::PreloadApplication(unsigned long,unsigned short const *,int)
w3wphost.dll!W3WP_HOST::ProcessHttpPreloadApplications(int)
w3wphost.dll!W3WP_HOST::ProcessPreloadApplications(unsigned long)
w3wphost.dll!WP_IPM::AcceptMessage()
iisutil.dll!IPM_MESSAGE_PIPE::MessagePipeCompletion(void *,unsigned char)
ntdll.dll!RtlpTpWaitCallback()
ntdll.dll!TppExecuteWaitCallback()
ntdll.dll!TppWorkerThread()
kernel32.dll!BaseThreadInitThunk�()
ntdll.dll!RtlUserThreadStart�()
The path parameter being passed by `aspnetcorev2.dll`'s `LoadRequestHandlerAssembly` to `LoadLibrary` is `C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App\6.0.7\aspnetcorev2_inprocess.dll`.

The stack trace also suggests to me that this was happening in application preloading (we do have preload enabled).

I have a process dump available on request, if you have somewhere secure that I can upload it.

Expected Behavior

After slow start of all sites all of them are initialized correctly and able to serve requests properly

Steps To Reproduce

I can provide a VM with everything already installed and failing every time there is a reset. I can also provide memory dumps of the failing processes if required as well.

Introduction to the repro project

Uploaded the basic project to https://github.com/jdmerinor/aspdotnetcorehangingreprocase This contains 3 pieces:

  1. SlowStartWebApi which is the server that represents the slow start service that consumes a good chunk of CPU, this is nothing super special really is the typical dotnet 6 example web api weather app but with these lines to make it use a bunch of CPU before calling app.Run();:
using var pacho = SHA512.Create();
var buffer = Encoding.ASCII.GetBytes("sadfhasdhfhasdfklsadjhfklsdahfojhdsaf");
while (stopwatch.Elapsed < TimeSpan.FromMinutes(1.5))
{
    count--;
    pacho.ComputeHash(buffer);
    count++;
}
  1. AppInstaller which is just using the Microsoft.Web.Administration to install all the sites in a convenient way (100 of them) with the following settings: Settings line
var appPool = serverManager.ApplicationPools.Add(appPoolName);
appPool.ProcessModel.LoadUserProfile = false;
appPool.Recycling.PeriodicRestart.Time = TimeSpan.Zero;
appPool.ManagedPipelineMode = ManagedPipelineMode.Integrated;
appPool.StartMode = StartMode.AlwaysRunning;
appPool.ProcessModel.IdleTimeout = TimeSpan.Zero;
appPool.Enable32BitAppOnWin64 = false;
appPool.ManagedRuntimeVersion = string.Empty;

application.ApplicationPoolName = appPool.Name;
application["preloadEnabled"] = true; //IMPORTANT FOR THE REPRODUCTION
  1. UIServer which is just a normal dotnet 6 example web api weather app. I added this one because I wasn’t sure if to repro the bug I needed more than one app per site… I have the feeling it will probably fail without it but I left it here for completeness.

Reproduction steps

I followed the following steps to reproduce the issue:

  1. Download a clean Windows server vhd from https://www.microsoft.com/en-us/evalcenter/download-windows-server-2019

  2. Add IIS role and make sure the Application Initialization inside the Application Development is also installed. Application initialization image

  3. Install dotnet 6 bundle from https://dotnet.microsoft.com/en-us/download/dotnet/thank-you/runtime-aspnetcore-6.0.11-windows-hosting-bundle-installer

  4. Put the apps publish folder in the desktop (using dotnet publish --configuration Release)

  5. Make sure IIS_IUSRS is added to the folder permissions.

  6. Make sure the server has a selfsigned or valid https certificate for the sites (otherwise you will get connection refused when trying to connect through https to the apps)

  7. Use the app installer to add a bunch of apps (might need to change the code to suit your server paths)

  8. Add hosts file entries to be able to hit the different sites from within the server

  9. Restart the server (This is going to initially take a while because the “SlowServer” is doing a bunch of hashing to max the CPU…)

  10. Once the initialization is done you can then try to call some of the endpoints and you will realize that they are just not responding at all… they will just load forever and they won’t ever recover. It’s easy to spot the ones that are broken because they would have really slow memory allocated to them in the w3wp.exe process

Exceptions (if any)

No response

.NET Version

6.0.11

Anything else?

No response

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
adityamandaleekacommented, Aug 9, 2023

Thanks @yaakov-h. I believe you are correct that you can keep your app on 6.0 and it should use the aspnetcorev2.dll dropped by the 8.0 hosting bundle, which will include this fix.

1reaction
adityamandaleekacommented, Jun 29, 2023

@jdmerinor Looks like this is caused by some code using TerminateThread in a case where the target thread is holding the loader lock. I’m going to look into how to fix this… including whether we can rework this code to eliminate the use of TerminateThread entirely.

The reason why you see this happening when under load is that the termination is in response to a timeout (which is of course more likely to happen when things are slow).

Read more comments on GitHub >

github_iconTop Results From Across the Web

ASP.NET Core IIS InProcess Hosting Issue in .NET Core 3.1
I ran into a nasty issue yesterday related to hosting an ASP.NET Core 3.1 server application in IIS using the default InProcess hosting....
Read more >
Common error troubleshooting for Azure App Service and ...
Provides troubleshooting advice for the most common errors when hosting ASP.NET Core apps on Azure Apps Service and IIS.
Read more >
What's a good strategy for clean/reliable shutdown of ...
When the main thread wants the other threads to shut down, it would just do: pthread_mutex_lock(&mkr_lock); mainKeepRunning = 0; ...
Read more >
Dell chromebook 3100 hard reset. Connect the power adapter ...
Restart to update: Quick Settings > Restart to Update. ... This works whether you remember your pa Manage your Dell EMC sites, ......
Read more >
Visual studio installer something went wrong
After clicking on OK the installer hanging and I must to kill it via Task Manager. Click on Command prompt (Admin). A download...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found