Threading bug with aspnet core 2.2 high volume production system
See original GitHub issueA few questions before you begin:
Is this an issue related to the Serilog core project or one of the sinks or community projects. This issue list is intended for Serilog core issues. If this issue relates to a sink or related project, please log on the related repository. Please use Gitter chat and Stack Overflow for discussions and questons.
Does this issue relate to a new feature or an existing bug?
- Bug
- New Feature
What version of Serilog is affected? Please list the related NuGet package. 2.8
What is the target framework and operating system? See target frameworks & net standard matrix.
- netCore 2.0
- netCore 1.0
- 4.7
- 4.6.x
- 4.5.x
Please describe the current behavior? I haven’t been able to reproduce the behavior locally as it seems to require quite a bit of load, so this just happens in our production systems (which from time to time has >10k concurrent users). So I guess this may as well be our setup (most likely is) and not Serilog itself, but I need some help getting some insights how this even can happen. Anyways, a dotnet core 2.2 web api, a middleware created to log various stuff using Serilog, WCF proxy client with a IClientMessageInspector logging requests and responses to WCF service. What actually happens during high load in production is that WCF response is logged with context information from a different call. E.g typically we have fields.RequestPath=/api/Transaction/123456789/Last in the log context and when we call WCF service, we log a message something like this:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header>
<Action s:mustUnderstand="1" xmlns="http://schemas.microsoft.com/ws/2005/05/addressing/none">http://tempuri.org/ILongRunning/DoWork</Action>
</s:Header>
<s:Body>
<DoWork xmlns="http://tempuri.org/">
<accountNumber>123456789</accountNumber>
</DoWork>
</s:Body>
</s:Envelope>
Now, in the response we log the response from WCF, we still have fields.RequestPath=/api/Transaction/123456789/Last, but the response can sometimes be something like this:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header />
<s:Body>
<DoWorkResponse xmlns="http://tempuri.org/">
<DoWorkResult>Finished 987654321</DoWorkResult>
</DoWorkResponse>
</s:Body>
</s:Envelope>
Typically DoWorkResult should have “Finished 123456789” as value. I just really can’t figure out how this context information is mixed up like this.
The IClientMessageInspector
internal class LoggingClientMessageInspector<T> : IClientMessageInspector
{
private const string AfterMessageTemplate = "WCF reply received after {WCFElapsed} with reply {WCFReply}";
private const string BeforeMessageTemplate = "WCF call {WCFRequest}";
public void AfterReceiveReply(ref Message reply, object correlationState)
{
var state = (LoggingClientMessageInspectorState)correlationState;
state.Log.Information(AfterMessageTemplate, state.Watch.Elapsed, reply.ToString());
}
public object BeforeSendRequest(ref Message request, IClientChannel channel)
{
var correlationId = Guid.NewGuid();
var log = Log.ForContext<T>().ForContext("WCFCorrelationId", correlationId);
var correlationState = new LoggingClientMessageInspectorState(log, correlationId);
log.Information(BeforeMessageTemplate, request.ToString());
return correlationState;
}
}
Initialization of ChannelFactory and channel
services.AddSingleton<IChannelFactory<IWcfServiceChannel>>(provider =>
{
var factory = new ChannelFactory<IWcfServiceChannel>(new BasicHttpBinding(), new EndpointAddress("uri_to_service"));
var behavior = provider.GetRequiredService<ILoggingEndpointBehavior<IWcfServiceChannel>>();
factory.Endpoint.EndpointBehaviors.Add(behavior);
return factory;
}).AddScoped(provider =>
{
var channel = provider.GetRequiredService<IChannelFactory<IWcfServiceChannel>>();
return channel.CreateChannel(new EndpointAddress("uri_to_service"));
});
Please describe the expected behavior? Obviously don’t want to the context mixed up with wrong replies.
If the current behavior is a bug, please provide the steps to reproduce the issue and if possible a minimal demo of the problem Not able to create a small repro as it seems to be only relevant when there is high load.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
Sorry for slow reply, but been on summer vacation 😃
We don’t have any concurrent calls internally in the service. However, given we’re using async/await, I suppose the thread can be reused at some point when we are awaiting response. Not 100% sure how all these parts will work together at this point then. I guess the LoggingClientMessageInspector is effectively a singleton as well, so it could be your solution is what I need.
Either way, I’ll give this a go and see how it works out. I’ll report back once we get this stuff out in production for others to see if it works out.
@nblumhardt By all means, quite sure we’re doing something wrong, but it’s really only manifests itself for Serilog and the log context. The service in question actually returns transactions for a netbank. It would be quite critical if we returned transactions to a customer for a different customer, and we don’t do that for sure. I’m just a bit unsure how to figure it out at this point as I feel there are some magic done in Serilog I don’t understand. Hence I came here hoping that someone might have some insight into what could even cause this. Obviously some instance of a logger is overwritten, but given we basically use 1 Middleware and fairly basic usage, it’s… Just weird.
@adamchester We do this just in the Middleware. We’re using ForContext(name, value) first and the LogContext.Pushproperty afterwards in a using statement before calling the next middleware.
Anyways, my first thought was that due to the asynchronity of the IClientMessageInspector coupled with new async/await was the issue. The fact that this is actually a singleton, since the channelfactory is a singleton, I think this may be the issue. I still find it a bit odd though, as I thought I captured the log context properly and for performance reasons would like to keep the channelfactory as a singleton and not scope it.