Transaction wasn't committed in postgres but no error reported by npgsql
See original GitHub issueSteps to reproduce
none š
The issue
This is weird, but we donāt know anywhere else to turn to except the npgsql driver. Following happened:
- A
POST
request to create a resource was triggered and completed successfully within 211 ms (average time) - There is no trace of this resource in the database and no trace of it ever existing.
The resource itself was a payment, and we perform accounting for that payment - the way it works, is, we have additional tables for bookkeeping which keep records with a sequential number. There is no record of the payment and there is no record missing. - That part of the system is relatively stable, and has countless e2e tests and weāve never seen anything like this before.
The whole request is in a database transaction. This is the interceptor which intercepts the call to the ASP.Net Core MVC-Action and wraps it in the database:
public class DbTransactionAsyncInterceptor : AsyncInterceptorBase
{
private readonly IEssentialContextWithTransactionSupport _context;
private readonly HttpContext _httpContext;
public DbTransactionAsyncInterceptor(
IEssentialContextWithTransactionSupport context,
IHttpContextAccessor contextAccessor)
{
_context = Check.NotNull(context, nameof(context));
_httpContext = Check.NotNull(contextAccessor, nameof(contextAccessor)).HttpContext;
}
protected override async Task<TResult> InterceptAsync<TResult>(IInvocation invocation, Func<IInvocation, Task<TResult>> proceed)
{
if (!ShouldIntercept(invocation))
{
return await proceed(invocation);
}
bool isReadonlyRequest = IsReadonlyRequest();
string httpMethod = _httpContext.Request.Method;
string httpPath = _httpContext.Request.Path;
// a wrapper for Database.CreateExecutionStrategy().ExecuteAsync(RunOperationWithinTransaction) with retry strategy
return await _context.ExecuteWithinTransactionAsync(
async transaction =>
{
if (isReadonlyRequest)
{
// results in Database.ExecuteSqlCommandAsync("SET TRANSACTION READ ONLY");
await _context.SetTransactionToReadOnlyAsync();
}
var result = await proceed(invocation);
if (!isReadonlyRequest)
{
transaction.Commit();
}
return result;
}, $"{httpMethod} - {httpPath}");
}
private bool ShouldIntercept(IInvocation invocation)
=> !_context.IsTransactionRunning && invocation.Method.IsAsyncAction();
private bool IsReadonlyRequest() => IsRequestMethodAnyOf("GET", "HEAD");
private bool IsRequestMethodAnyOf(params string[] requestMethods)
{
string method = _httpContext.Request.Method?.ToUpper();
return requestMethods?.Any(m => m.ToUpper() == method) ?? false;
}
}
The way this should work (and so far always worked) is:
- When there are any issues processing any part of the request, we roll back the whole request and return the error
- Whenever we see a successful result, we are guaranteed that the transaction completed successfully
- There is a slight chance that the transaction was committed and some middleware down the line would have failed, causing a 500 - donāt recall this happening ever in the past months, but itās possible
The weird thing that happened - the action response was 200, we know the related resources as well as the ID - but can neither find a trace of the ID in the DB nor seem the related resources affected. - Weāre also sourcing related tables via Debezium to be processed by other services and it didnāt show up there either.
There were no errors during this time in our system or infrastructure.
So the only possible issue that we could imagine, to explain this would be, that transaction.Commit();
failed on postgres but no exception was thrown by npgsql.
I know this is very vague, but unfortunately, we donāt have logs from posgres for that period anymore, to confirm that suspicion. But we also never happened anything like this before - inexplicable missing of data. - Do you have maybe any other ideas which could cause an undetected āspontaneous rollbackā?
We retried the operation with exactly the same parameters today and it worked.
Further technical details
Npgsql version: 4.1.2
PostgreSQL version: AWS RDS Postgres 11
Operating system: Docker image FROM mcr.microsoft.com/dotnet/core/aspnet:3.1
Other details about my project setup:
The connection string: UserId=...;Password=...;Server=...;Port=5432;Database=...;Maximum Pool Size=500;Max Auto Prepare=500;
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (6 by maintainers)
Once a transaction is in a failed state, any statement will fail until it is rolled back - including SELECT 1. So what you propose will work, although it would add an extra database roundtrip for every single transaction, which is quite bad for perf. Iād consider refactoring things so that exceptions bubble up instead, at which point youāre aware if the transaction is failed or not.
@pgrm the connectionās transaction state currently isnāt exposed to the user in any way, so youāll have to make sure any exceptions that happen must bubble up and be caught.
For now Iām going to close this issue since thereās nothing immediately actionable here, but Iām definitely following (and engaging) on that thread. We may end up doing something in Npgsql to help users manage this.