Usage of fibers causes random .NET runtime crashes
See original GitHub issueVersion All
Description In its main loop (ScriptMain), SHVDN makes use of fibers and also calls into unmanaged code implemented using fibers. This is not supported by .NET and will inevitably lead to random crashes, most notably making the .NET exception handler in CLRVectoredExceptionHandler assume that there is no stack space left. The Microsoft docs state “The .NET threading model does not support fibers. You should not call into any unmanaged function that is implemented by using fibers. Such calls may result in a crash of the .NET runtime.” With my limited knowledge about the project it is unclear to me why a fiber-based approach was chosen here as it seems like a grave design mistake, but I did not study the entire project.
Crash Details
Since a fiber is not its own thread, but behaves like one in certain ways, it has its own stack space. This means that for one thread there will be different stack lower and upper bounds, depending on whether the fiber is running or not. Whenever an exception in .NET occurs, the crash handler in CLRVectoredExceptionHandler gets called. (I am referencing the initial coreclr commit here as it is closer to what .NET Framework uses than the current .NET core implementation which does not use stack probing at all and hence is not affected). The call to Thread::IsStackSpaceAvailable returns false
when on a fiber-stack, since its internal call to GetLastNormalStackAddress uses a cached stack limit. Naturally this should be using the fiber’s stack bounds, but due to the caching it does not. While this is definitely not handled ideally by .NET Framework (and fixed in .NET core), the issue remains that fibers are not supported. Unfortunately, this means that any exception, whether it is a C++ exception as mentioned in #936 or a .NET exception, will cause the runtime to panic and call DontCallDirectlyForceStackOverflow, subsequently terminating the process. Please note that this crash does not occur on every machine and seemingly at random, but since I had access to a user machine where it happened on every single exception, it was very easy to debug and pinpoint.
Example stack trace where the offending line is a a NullReferenceException in .NET wrapped by try-catch (which is not hit for the reasons outlined above): 0:000> !dumpstack OS Thread Id: 0x882c (0) Current frame: clr!DontCallDirectlyForceStackOverflow+0x10 Child-SP RetAddr Caller, Callee 00000011620f31f0 00007ffc7bbe9a11 clr!CLRVectoredExceptionHandler+0xa8, calling clr!DontCallDirectlyForceStackOverflow 00000011620f3220 00007ffc7ba2f96e clr!SaveCurrentExceptionInfo+0x72, calling clr!ClrFlsSetValue 00000011620f3250 00007ffc7ba2fdcf clr!CLRVectoredExceptionHandlerShim+0xa3, calling clr!CLRVectoredExceptionHandler 00000011620f3280 00007ffc97e883dc ntdll!RtlpCallVectoredHandlers+0x108, calling ntdll!guard_dispatch_icall_nop 00000011620f3320 00007ffc97e5b406 ntdll!RtlDispatchException+0x66, calling ntdll!RtlpCallVectoredHandlers 00000011620f3350 00007ffc7b8d882a clr!invokeCompileMethod+0x97, calling clr!invokeCompileMethodHelper 00000011620f33c0 00007ffc7b8d875e clr!CallCompileMethodWithSEHWrapper+0xe5 00000011620f33f0 00007ffc97e25d21 ntdll!RtlFreeHeap+0x51, calling ntdll!RtlpFreeHeapInternal 00000011620f3430 00007ffc7b8c5809 clr!EEHeapFreeInProcessHeap+0x45, calling KERNEL32!HeapFreeStub 00000011620f3460 00007ffc7b8d864d clr!UnsafeJitFunction+0x81b, calling clr!_security_check_cookie 00000011620f3530 00007ffc97eafe3e ntdll!KiUserExceptionDispatch+0x2e, calling ntdll!RtlDispatchException 00000011620f4330 00007ffc1ca6dc66 (MethodDesc 00007ffc1c87e010 +0x26 Rage.Attributes.PluginAttribute.get_Name()) ====> Exception Code c0000005 cxr@00000011620f3540 exr@00000011620f3a30
Resolution I was able to fix this issue by removing the fiber logic from ScriptMain as well as no longer relying on SHV’s scriptRegister. Unfortunately, you will have to provide your own script VM tick hook as SHV uses fibers and just removing the fiber logic from SHVDN is not enough. For a simple PoC, I hooked a native and ticked SHVDN from there and it worked fine: no more fiber related crashes! You should be able to still use SHV to receive keyboard callbacks according to my own testing.
If you have any questions or feedback as to why fibers were used (or must be used), please let me know.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:34 (25 by maintainers)
Top GitHub Comments
The reason RPH reimplements the main script loop is to allow execution of certain RPH functionality, such as console commands (which may run natives) even when the game is paused or is not ticking script threads for another reason. At its heart, the tick for plugins is still handled via a normal scrThread in the list and you probably want to do the same.
Ah, I just remembered you made #1181 . Since it wasn’t mentioned that you can close a thread handle (to avoid thread handle leak that will result in the thread persistence iirc) in that PR, maybe there’s only a few things that should be changed and we can reuse majority of it? For thread handle leak, I have a few cases to show. https://github.com/Lyall/IshinFix/issues/5 https://github.com/kagikn/ExeIntegrityBypassAgainstRGL/blob/40c18129692a5316ff0634a1ce736092bb765769/ExeIntegrityBypassAgainstRGL/dllmain.cpp#L145
Also, crosire didn’t mention scripts used .NET threads as foreground threads in said PR, which can prevent the game from exiting (I changed this to make scripts run in background threads in da63afd).