question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Idle connection pruning kills too many connections.

See original GitHub issue

Steps to reproduce

Run the test program at the bottom of this post.

The issue

The idle connection pruning kills connections that have recently been busy, causing performance issues because we need to open new (non-pooled) connections. I created a test program that runs simple queries (select null), possible in parallel with multiple connections, and tracks how many pooled and non-pooled connections were used. With a provider based on the current hotfix/4.1.3 (f615edb5229a34a29e2a8cba97a1b482d1f59370), this produces the following output:

ConnectionPruningInterval: 2
ConnectionIdleLifetime: 10

Running simple test case with 1/1/1 connections.
Clearing all connection pools.
Running 1 parallel queries... pooled: 0/1
Waiting for 8 seconds.
Running 1 parallel queries... pooled: 1/1
Waiting for 4 seconds.
Running 1 parallel queries... pooled: 0/1

Running simple test case with 2/1/2 connections.
Clearing all connection pools.
Running 2 parallel queries... pooled: 0/2
Waiting for 8 seconds.
Running 1 parallel queries... pooled: 1/1
Waiting for 4 seconds.
Running 2 parallel queries... pooled: 0/2

Running simple test case with 1/2/2 connections.
Clearing all connection pools.
Running 1 parallel queries... pooled: 0/1
Waiting for 8 seconds.
Running 2 parallel queries... pooled: 1/2
Waiting for 4 seconds.
Running 2 parallel queries... pooled: 1/2

As you can see, in the first two tests (1/1/1 and 2/1/2) the provider prunes all connections between the 8th second and the 12th second (probably at the 10th second), even though one of the connections has been used at the 8th second. In the third case it prunes one of the connections, even though both have been used at the 8th second.

Further technical details

Npgsql version: Locally built package at git hash f615edb5229a34a29e2a8cba97a1b482d1f59370. PostgreSQL version: PostgreSQL 11.4 Operating system: Server: Alpine Linux docker image / Client: Windows 10

The code of the test progran:

using Npgsql;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace NugetConnectionTimeouts
{
    public class Program
    {
        private const string Username = "REPLACE_ME";
        private const string Password = "REPLACE_ME";
        private const string Host =     "REPLACE_ME";
        private const string Database = "REPLACE_ME";

        private const int ConnectionPruningInterval = 2;
        private const int ConnectionIdleLifetimeFactor = 5;
        private const int ConnectionIdleLifetime = ConnectionPruningInterval * ConnectionIdleLifetimeFactor;

        public static void Main(string[] args)
        {
            RunTestCasesAsync().GetAwaiter().GetResult();
        }

        public static async Task RunTestCasesAsync()
        {
            Console.WriteLine($"ConnectionPruningInterval: {ConnectionPruningInterval}\nConnectionIdleLifetime: {ConnectionIdleLifetime}");
            await RunSimpleTestCaseAsync(1, 1, 1);
            await RunSimpleTestCaseAsync(2, 1, 2);
            await RunSimpleTestCaseAsync(1, 2, 2);
        }

        private static async Task RunSimpleTestCaseAsync(int initialCount, int intermediateCount, int finalCount)
        {
            Console.WriteLine();
            Console.WriteLine($"Running simple test case with {initialCount}/{intermediateCount}/{finalCount} connections.");
            Console.WriteLine("Clearing all connection pools.");
            NpgsqlConnection.ClearAllPools();
            await RunTestQueriesAsync(initialCount);
            await WaithThenRunTestQueriesAsync(ConnectionPruningInterval * (ConnectionIdleLifetimeFactor - 1), intermediateCount);
            await WaithThenRunTestQueriesAsync(ConnectionPruningInterval * 2, finalCount);
        }

        private static async Task WaithThenRunTestQueriesAsync(int secondsToWait, int connectionCount)
        {
            Console.WriteLine($"Waiting for {secondsToWait} seconds.");
            await Task.Delay(TimeSpan.FromSeconds(secondsToWait));
            await RunTestQueriesAsync(connectionCount);
        }

        private static async Task RunTestQueriesAsync(int connectionCount)
        {
            Console.Write($"Running {connectionCount} parallel queries...");
            string connectionString = new NpgsqlConnectionStringBuilder
            {
                Username = Username,
                Host = Host,
                Database = Database,
                ConnectionPruningInterval = ConnectionPruningInterval,
                ConnectionIdleLifetime = ConnectionIdleLifetime
            }.ConnectionString;
            int pooledConnectionCount = connectionCount;
            ProvidePasswordCallback callback = (string host, int port, string database, string username) =>
            {
                Interlocked.Decrement(ref pooledConnectionCount);
                return Password;
            };
            var connections = new List<NpgsqlConnection>();
            try
            {
                for (int i = 0; i < connectionCount; ++i)
                {
                    var connection = new NpgsqlConnection(connectionString);
                    connection.ProvidePasswordCallback = callback;
                    connections.Add(connection);
                }
                await Task.WhenAll(connections.Select(x => x.OpenAsync()));
                await Task.WhenAll(connections.Select(async x =>
                {
                    using (var command = x.CreateCommand())
                    {
                        command.CommandText = "select null";
                        await command.ExecuteScalarAsync();
                    }
                }));
            }
            finally
            {
                foreach (var connection in connections)
                {
                    await connection.DisposeAsync();
                }
            }
            Console.WriteLine($" pooled: {pooledConnectionCount}/{connectionCount}");
        }
    }
}

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:21 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
rojicommented, Oct 26, 2020

@Toxantron Npgsql 5.0 will be released at the same time as .NET 5.0, which means November 10th (two weeks away!). This is why I’m pushing out issues out of the release 😃 I’ve also updated the milestone with the date, thanks.

However, nothing is stopping us from making a 5.1 release at some point after 5.0, without waiting for .NET 6.0.

1reaction
Kharoscommented, Oct 26, 2020

@roji I don’t have a concrete proposal yet. I agree with your points, especially that #2929 could be generally useful. That feature would probably be useful for me even if if the pool was fully optimized for my “sparse usage” case, as it will improve the behavior when load suddenly increases.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - Idle connection pruning kills too many connections. -
Run the test program at the bottom of this post. The issue. The idle connection pruning kills connections that have recently been busy,...
Read more >
Why does my application have over 20000 idle (sleeping ...
One I started caching and re-using the same SqlCredential object, the number of sleeping connections went from 20k+ down to about 6.
Read more >
How to close idle connections in PostgreSQL automatically?
go to RDS > Parameter groups > Create parameter group Select the version of PSQL that you use, name it 'customParameters' or whatever...
Read more >
My ISP Is Killing My Idle SSH Sessions
So the classic solution to this problem is to use an LRU cache. So when your router is close to running out of...
Read more >
Three bugs in the Go MySQL Driver
That means that database/sql will continuously kill connections when they've lived that long, even if the connections are being actively used ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found