question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Suspected race condition in NpgsqlConnector.EndUserAction

See original GitHub issue

Steps to reproduce

Unfortunately, so far I have not been able to identify steps to reproduce this problem. We only know that most often the problem occurs when there are certain problems on the database server

The issue

From time to time I see an error with the following content:

Exception type: System.Threading.SemaphoreFullException
Exception message: Adding the specified count to the semaphore would cause it to exceed its maximum count.
Stack trace: 
at System.Threading.SemaphoreSlim.Release(Int32 releaseCount)
   at Npgsql.NpgsqlConnector.EndUserAction()
   at Npgsql.NpgsqlDataReader.Cleanup(Boolean async, Boolean connectionClosing)
   at Npgsql.NpgsqlDataReader.Close(Boolean connectionClosing, Boolean async)
   at Npgsql.NpgsqlConnector.CloseOngoingOperations(Boolean async)
   at Npgsql.NpgsqlConnection.<Close>g__CloseOngoingOperationAndFinish|76_0(NpgsqlConnection connection, NpgsqlConnector connector, Boolean async)
   at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.CloseAsync()
   at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.CloseAsync()
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.CleanupCommandAsync(DbCommand command, IRelationalConnection connection)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(DbContext _, Boolean result, CancellationToken cancellationToken)
   at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()

After that, using dotnet-counters, I see an increased number of busy connections. If you look at pg_stat_activity, we see that there are actually no “busy” connections, there are Idle connections that are no longer used for Keep Alive.

Further technical details

Npgsql version: 4.1.6 Microsoft.EntityFrameworkCore version: 3.1.10 PostgreSQL version: PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit Operating system: Linux centos-release-7-5.1804.el7.centos.x86_64

Other details about my project setup:

Keepalive enabled! ConnectionString: User ID=user;Password=password;Host=host;Port=5432;Database=dbname;Keepalive=1;Pooling=true;Minimum Pool Size=69;Maximum Pool Size=70;Command Timeout=300;Application Name=AppWithProblems;Max Auto Prepare=150;

I looked at the code, but the only place I have doubts about is NpgsqlConnector.EndUserAction

if (_isKeepAliveEnabled)
{
    lock (this)
    {
       if (IsReady || !IsConnected)
         return;

        var keepAlive = Settings.KeepAlive * 1000;
        _keepAliveTimer!.Change(keepAlive, keepAlive);

         Log.Trace("End user action", Id);
         _currentCommand = null;
         _userLock!.Release();
         State = ConnectorState.Ready;
    }
}

Specifically the following code:

 if (IsReady || !IsConnected)
   return;

Here, two comparisons are made with a variable that can potentially change at the moment when the first comparison has already been made, and the second has not yet. (Both properties (IsReady and IsConnected) compare the State property to ConnectorState enum elements. State property declared as:

internal ConnectorState State
        {
            get => (ConnectorState)_state;
            set
            {
                var newState = (int)value;
                if (newState == _state)
                    return;
                Interlocked.Exchange(ref _state, newState);
            }
        }

What do you think about that? Could there be a race condition here, or am I wrong?

Would a fix like this help here ?:

//NpgsqlConnector.EndUserAction:
//...
if (_isKeepAliveEnabled)
{
    lock (this)
    {
        var state = State;
        if (state == ConnectorState.Ready || 
           state == ConnectorState.Closed ||
           state == ConnectorState.Connecting ||
           state == ConnectorState.Broken) return;
//...

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:18 (18 by maintainers)

github_iconTop GitHub Comments

1reaction
vonzshikcommented, Feb 12, 2021

@neyromant just something that came to my mind.

at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.CleanupCommandAsync(DbCommand command, IRelationalConnection connection)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)

From your stack trace, which is peculiar, as it should happen only if there was an exception while executing a reader (refs). Which leads me to believe that we get the exception which npgsql doesn’t handle that well, and it leaves the connection is a bad state. And after that, we get another exception on the cleanup, which swallows the original one. I think, you should enable the efcore logging, so we can get a clean picture of the original exception.

1reaction
vonzshikcommented, Feb 5, 2021

I did attempt to replicate the issue, but so far no problems(

Read more comments on GitHub >

github_iconTop Results From Across the Web

Do I need to worry about race conditions even when using ...
1 Answer. There is no race condition here, because an UPDATE will put a row lock on the modified row, so updates from...
Read more >
Fantastic Bugs and How to Resolve Them Ep2: Race ...
A race condition is a scenario where two or more flows take place concurrently, affecting one another in an unplanned manner and often ......
Read more >
Race conditions | Web Security Academy
Race conditions are a common type of vulnerability closely related to business logic flaws. They occur when websites process requests concurrently without ....
Read more >
Troubleshooting Race Conditions in Terraform
Apply explicit dependencies: If there is a suspected race condition due to missing or incorrect dependencies, explicitly define dependencies ...
Read more >
Is a race condition with no downside still a race condition?
I am writing a piece of code which takes the difference between two numbers, finds that location in memory, and then sets the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found