question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

node-postgres handles server disconnection differently on macOS and Linux

See original GitHub issue

Hello, the company I work for has been using pg-promise for our database service and we’ve run into an issue with failover, which we believe is an error in the underlying node-postgres library. We believe the issue is due to the way Linux handles socket timeout events differently to macOS.

Steps to reproduce:

  • Connect to a Postgres server. We used one hosted in AWS RDS with MultiAZ failover enabled.
  • Run a query every N seconds
  • Reboot the server such that the connection is dropped without a TCP FIN packet. We did a reboot with failover in AWS RDS.

Note: we believe this scenario is not specific to RDS, but rather any network outage or server failure which does not send a TCP FIN packet.

Expected outcome and actual outcome on macOS:

  • The next query fails and the failing client is removed from the pool.
  • Subsequent queries use a new client which tries to establish a fresh connection.
  • When the server reboot/failover is complete, these queries will succeed.

Actual outcome on Linux:

  • The next query fails, but the bad client is not removed from the pool.
  • Subsequent queries try to re-use the bad client and fail even after the reboot/failover is complete.

Detailed order of events

macOS

  • Successful query
  • Reboot DB
  • DB stops listening on original IP
  • Client begins a further query
  • TCP sends query, does not recieve an ACK
  • TCP begins retransmission, does not receive an ACK
  • Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
  • Client rejects query promise with “Error: read ETIMEDOUT”
  • Immediately after the error the “connection” remains in node-pg’s pool
  • Almost immediately afterwards the pool emits an “error” event
  • The “connection” is removed from node-pg’s pool
  • Client begins another query
  • DNS fetches the new IP
  • TCP successfully submits and retrieves the query to the new IP

Linux

  • Successful query
  • Reboot DB
  • DB stops listening on original IP
  • Client begins a further query
  • TCP sends query, does not receive an ACK
  • TCP begins retransmission, does not receive an ACK
  • Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
  • Client rejects query promise with a “Error: Connection terminated unexpectedly”
    error Error: Connection terminated unexpectedly
      at Connection.con.once (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:235:9)
      at Object.onceWrapper (events.js:313:30)
      at emitNone (events.js:106:13)
      at Connection.emit (events.js:208:7)
      at Socket.<anonymous> (/src/node_modules/pg-promise/node_modules/pg/lib/connection.js:131:10)
      at emitNone (events.js:111:20)
      at Socket.emit (events.js:208:7)
      at endReadableNT (_stream_readable.js:1056:12)
      at _combinedTickCallback (internal/process/next_tick.js:138:11)
      at process._tickDomainCallback (internal/process/next_tick.js:218:9)
    
  • After the error, the “connection” remains in node-pg’s pool
  • Subsequent queries fail immediately without sending any data with the following error:
    error Error: Client has encountered a connection error and is not queryable
      at process.nextTick (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:500:25)
      at _combinedTickCallback (internal/process/next_tick.js:131:7)
      at process._tickDomainCallback (internal/process/next_tick.js:218:9)
    

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:24 (19 by maintainers)

github_iconTop GitHub Comments

2reactions
sehropecommented, Aug 8, 2019

The pool is not aware of the connection status. It’s up to the user to inform the pool that the connection is either still valid (via client.release()) or invalid (via client.release(err)).

From the docs:

The releaseCallback releases an acquired client back to the pool. If you pass a truthy value in the err position to the callback, instead of releasing the client to the pool, the pool will be instructed to disconnect and destroy this client, leaving a space within itself for a new client.

Using pool.query(...) does this automatically. If you manage the connection yourself or provide your own wrapper around pool.connect() then you need to ensure that errant connections are evicted via client.release(err).

A “smarter” wrapper might differentiate between transient errors (ex: UNIQUE key violation) or permanent ones (ex: connection killed), but a simple default is to design applications with queries that do not fail and evict connections from the pool when they do.

1reaction
vitaly-tcommented, Aug 15, 2019

@jmacmahon At this point I suggest that you follow it up here instead 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

node.js - when to disconnect and when to end a pg client or pool
Its quite simple, a client-connection (single connection) opens up, query with it, once you are done you end it. The pool concept is...
Read more >
Connecting – node-postgres
Here's a tiny program connecting node.js to the PostgreSQL server: ... us reuse them to connect to different databases without having to modify...
Read more >
node-mssql | Microsoft SQL Server client for Node.js
Default connection string when connecting to port: Driver={SQL Server Native Client 11.0};Server={#{server},#{port}};Database={#{database}};Uid={#{user}};Pwd={ ...
Read more >
node-postgres - Bountysource
We believe the issue is due to the way Linux handles socket timeout events differently to macOS. Steps to reproduce:.
Read more >
Nakama: TypeScript Runtime | Heroic Labs Documentation
TypeScript Runtime #. The game server embeds a JavaScript Virtual Machine (VM) which can be used to load and run custom logic specific...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found