question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Broken Postgres clients are released back into the pool, but should be removed

See original GitHub issue

Issue type:

[x] bug report

Database system/driver:

[x] postgres

It may impact other drivers if they have similar semantics/expectations as pg.Pool.

TypeORM version:

[x] latest

Explanation of the problem

Currently in TypeORM, if a client suffers an unrecoverable error - for example, if the underlying connection goes away, such as during a DB failover - there is no protection in place to stop that client being added back into the pg.Pool. This broken client will then be handed out in future even though it’ll never be able to execute a successful query.

Although pg.Pool itself does listen for error events on clients within the pool - and actively removes any which do emit errors - it doesn’t catch everything. It is considered the responsibility of the user of the pool to release known broken clients by calling client.release(true). The truthy argument tells the pool to destroy the connection instead of adding it back to the pool

https://node-postgres.com/api/pool#releaseCallback

The releaseCallback releases an acquired client back to the pool. If you pass a truthy value in the err position to the callback, instead of releasing the client to the pool, the pool will be instructed to disconnect and destroy this client, leaving a space within itself for a new client.

If a client’s connection does break, it’s very difficult to debug, as a handful of queries will begin failing with an error like Error: Client has encountered a connection error and is not queryable while others will continue executing fine.

There is further discussion about the impact of this in the node-postgres repo: https://github.com/brianc/node-postgres/issues/1942

Steps to reproduce or a small repository showing the problem

  • Set pool size to 1
  • Run a small loop which repeatedly performs a query, e.g.
while (true) {
  try {
    await getManager().query('select pg_sleep(4)')
    console.log('success')
  } catch (err) {
    console.log('error', err)
  }

  console.log('done')
  await sleep(1000)
}
  • Wait for the query to run at least once. This should allocate a connection from the pool. Subsequent invocations of the query should use the same connection.
  • Kill the connection while query is running. On a Mac or Linux, this is easily done by finding and killing the process, e.g. ps aux | grep postgres | grep SELECT
  • The loop continue to throw errors, as the broken connection has been returned to the pool and continues to be handed out. This shouldn’t happen: we should have told pg.Pool the connection is broken so it can be removed from the pool, so the next query should get a new, unbroken, connection

I believe the reason this hasn’t been noticed before (at least not that I could see) is because it’s really only likely to happen if the actual database connection breaks. The majority of QueryFailedErrors are caused by dodgy SQL etc, none of which will render a client unusable. And, usually, if your database is killing connections, you’ve got other problems to think about 😅

We only noticed it because we run PgBouncer in between TypeORM and our Postgres server. When we redeployed PgBouncer, it would kill some of the active client connections in the pool, but because pg.Pool never found out about it, those connections remained in the pool indefinitely, causing a steady stream of errors even though everything else was fine.

Fix

I have a working fix here: https://github.com/loyaltylion/typeorm/commit/6bd52e04ffa5ba229874eecaac9c78b1628eb1ae

If this fix looks suitable, I’d be happy to create a PR to get this merged. It only applies to postgres, but could be extended to other drivers if we think they’d benefit.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:20
  • Comments:25 (2 by maintainers)

github_iconTop GitHub Comments

10reactions
AlexSun98commented, Jun 23, 2021

any new updates?

5reactions
michaelseibtcommented, Apr 10, 2020

Any help now is greatly appreciated.

What worked for us is to check for broken connections at the beginning of every invocation. This will run a simple query on every client of the pool, and if that fails, will reconnect to the database.

Call this method from inside the event handler function.

const reconnectToDatabase = async () => {
  const connection = getConnection();
  const driver = connection.driver as any;
  for (const client of driver.master._clients) {
    try {
      await client.query('SELECT 1');
    } catch (error) {
      console.info('Reconnecting ...');
      await getConnection().driver.disconnect();
      await getConnection().driver.connect();
      break;
    }
  }
}

(FYI, @brianc - maybe that helps for node-postgres side of the story 😃 )

Read more comments on GitHub >

github_iconTop Results From Across the Web

Removing PostgreSQL Bottlenecks Caused by High Traffic
When the client disconnects, the server connection will be put back into the pool. This is the default method. Transaction pooling: A server ......
Read more >
How to deal with closed connections in database pool
This article explains how to overcome the "connection is closed" error that sometimes is seen on the mule logs when connecting to a...
Read more >
Transaction Processing in PostgreSQL
Currently, removal of long-dead tuples is handled bya VACUUM maintenance command that must be issuedperiodically.
Read more >
Release notes — Psycopg 2.9.5 documentation
The psycopg2.tz module is deprecated and scheduled to be dropped in the next ... Dropped support for client library older than PostgreSQL 9.1...
Read more >
A Lightweight Connection Pooler for PostgreSQL PgBouncer
When the client disconnects, the server connection will be put back into ... transaction Server is released back to pool after transaction finishes....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found