question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transaction wasn't committed in postgres but no error reported by npgsql

See original GitHub issue

Steps to reproduce

none šŸ˜ž

The issue

This is weird, but we don’t know anywhere else to turn to except the npgsql driver. Following happened:

  • A POST request to create a resource was triggered and completed successfully within 211 ms (average time)
  • There is no trace of this resource in the database and no trace of it ever existing.

The resource itself was a payment, and we perform accounting for that payment - the way it works, is, we have additional tables for bookkeeping which keep records with a sequential number. There is no record of the payment and there is no record missing. - That part of the system is relatively stable, and has countless e2e tests and we’ve never seen anything like this before.

The whole request is in a database transaction. This is the interceptor which intercepts the call to the ASP.Net Core MVC-Action and wraps it in the database:

public class DbTransactionAsyncInterceptor : AsyncInterceptorBase
{
    private readonly IEssentialContextWithTransactionSupport _context;
    private readonly HttpContext _httpContext;

    public DbTransactionAsyncInterceptor(
        IEssentialContextWithTransactionSupport context,
        IHttpContextAccessor contextAccessor)
    {
        _context = Check.NotNull(context, nameof(context));
        _httpContext = Check.NotNull(contextAccessor, nameof(contextAccessor)).HttpContext;
    }

    protected override async Task<TResult> InterceptAsync<TResult>(IInvocation invocation, Func<IInvocation, Task<TResult>> proceed)
    {
        if (!ShouldIntercept(invocation))
        {
            return await proceed(invocation);
        }

        bool isReadonlyRequest = IsReadonlyRequest();

        string httpMethod = _httpContext.Request.Method;
        string httpPath = _httpContext.Request.Path;

        // a wrapper for Database.CreateExecutionStrategy().ExecuteAsync(RunOperationWithinTransaction) with retry strategy
        return await _context.ExecuteWithinTransactionAsync(
            async transaction =>
            {
                if (isReadonlyRequest)
                {
                    // results in Database.ExecuteSqlCommandAsync("SET TRANSACTION READ ONLY");
                    await _context.SetTransactionToReadOnlyAsync();
                }

                var result = await proceed(invocation);

                if (!isReadonlyRequest)
                {
                    transaction.Commit();
                }

                return result;
            }, $"{httpMethod} - {httpPath}");
    }

    private bool ShouldIntercept(IInvocation invocation)
        => !_context.IsTransactionRunning && invocation.Method.IsAsyncAction();

    private bool IsReadonlyRequest() => IsRequestMethodAnyOf("GET", "HEAD");

    private bool IsRequestMethodAnyOf(params string[] requestMethods)
    {
        string method = _httpContext.Request.Method?.ToUpper();

        return requestMethods?.Any(m => m.ToUpper() == method) ?? false;
    }
}

The way this should work (and so far always worked) is:

  • When there are any issues processing any part of the request, we roll back the whole request and return the error
  • Whenever we see a successful result, we are guaranteed that the transaction completed successfully
  • There is a slight chance that the transaction was committed and some middleware down the line would have failed, causing a 500 - don’t recall this happening ever in the past months, but it’s possible

The weird thing that happened - the action response was 200, we know the related resources as well as the ID - but can neither find a trace of the ID in the DB nor seem the related resources affected. - We’re also sourcing related tables via Debezium to be processed by other services and it didn’t show up there either.

There were no errors during this time in our system or infrastructure.

So the only possible issue that we could imagine, to explain this would be, that transaction.Commit(); failed on postgres but no exception was thrown by npgsql.

I know this is very vague, but unfortunately, we don’t have logs from posgres for that period anymore, to confirm that suspicion. But we also never happened anything like this before - inexplicable missing of data. - Do you have maybe any other ideas which could cause an undetected ā€œspontaneous rollbackā€?

We retried the operation with exactly the same parameters today and it worked.

Further technical details

Npgsql version: 4.1.2 PostgreSQL version: AWS RDS Postgres 11 Operating system: Docker image FROM mcr.microsoft.com/dotnet/core/aspnet:3.1

Other details about my project setup:

The connection string: UserId=...;Password=...;Server=...;Port=5432;Database=...;Maximum Pool Size=500;Max Auto Prepare=500;

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
rojicommented, Feb 23, 2020

Once a transaction is in a failed state, any statement will fail until it is rolled back - including SELECT 1. So what you propose will work, although it would add an extra database roundtrip for every single transaction, which is quite bad for perf. I’d consider refactoring things so that exceptions bubble up instead, at which point you’re aware if the transaction is failed or not.

2reactions
rojicommented, Feb 23, 2020

@pgrm the connection’s transaction state currently isn’t exposed to the user in any way, so you’ll have to make sure any exceptions that happen must bubble up and be caught.

For now I’m going to close this issue since there’s nothing immediately actionable here, but I’m definitely following (and engaging) on that thread. We may end up doing something in Npgsql to help users manage this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Npgsql does not commit transaction after failed command
I'm using Npgsql 2.0.11 under .NET 4.0 to modify a PostgreSQL 9.0 database. The program makes many modifications to the database, all within...
Read more >
Thread: Error on failed COMMIT - Postgres Professional
There is a current discussion off-list about what should happen when a COMMIT is issued for a transaction that cannot be committed for...
Read more >
Documentation: 15: 3.4. Transactions
If, partway through the transaction, we decide we do not want to commit (perhaps we just noticed that Alice's balance went negative), we...
Read more >
current transaction is aborted, commands ignored until end ...
1 Answer 1 ... This error occurs when a previous query has failed and the client still issues queries in that transaction. The...
Read more >
Transaction anomalies with SELECT FOR UPDATE
This article shows how surprising transaction anomalies can happen with SELECT FOR UPDATE and what you can to to avoid them.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found