node-postgres handles server disconnection differently on macOS and Linux
See original GitHub issueHello, the company I work for has been using pg-promise
for our database service and we’ve run into an issue with failover, which we believe is an error in the underlying node-postgres
library. We believe the issue is due to the way Linux handles socket timeout events differently to macOS.
Steps to reproduce:
- Connect to a Postgres server. We used one hosted in AWS RDS with MultiAZ failover enabled.
- Run a query every N seconds
- Reboot the server such that the connection is dropped without a TCP FIN packet. We did a reboot with failover in AWS RDS.
Note: we believe this scenario is not specific to RDS, but rather any network outage or server failure which does not send a TCP FIN packet.
Expected outcome and actual outcome on macOS:
- The next query fails and the failing client is removed from the pool.
- Subsequent queries use a new client which tries to establish a fresh connection.
- When the server reboot/failover is complete, these queries will succeed.
Actual outcome on Linux:
- The next query fails, but the bad client is not removed from the pool.
- Subsequent queries try to re-use the bad client and fail even after the reboot/failover is complete.
Detailed order of events
macOS
- Successful query
- Reboot DB
- DB stops listening on original IP
- Client begins a further query
- TCP sends query, does not recieve an ACK
- TCP begins retransmission, does not receive an ACK
- Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
- Client rejects query promise with “Error: read ETIMEDOUT”
- Immediately after the error the “connection” remains in node-pg’s pool
- Almost immediately afterwards the pool emits an “error” event
- The “connection” is removed from node-pg’s pool
- Client begins another query
- DNS fetches the new IP
- TCP successfully submits and retrieves the query to the new IP
Linux
- Successful query
- Reboot DB
- DB stops listening on original IP
- Client begins a further query
- TCP sends query, does not receive an ACK
- TCP begins retransmission, does not receive an ACK
- Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
- Client rejects query promise with a “Error: Connection terminated unexpectedly”
error Error: Connection terminated unexpectedly at Connection.con.once (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:235:9) at Object.onceWrapper (events.js:313:30) at emitNone (events.js:106:13) at Connection.emit (events.js:208:7) at Socket.<anonymous> (/src/node_modules/pg-promise/node_modules/pg/lib/connection.js:131:10) at emitNone (events.js:111:20) at Socket.emit (events.js:208:7) at endReadableNT (_stream_readable.js:1056:12) at _combinedTickCallback (internal/process/next_tick.js:138:11) at process._tickDomainCallback (internal/process/next_tick.js:218:9)
- After the error, the “connection” remains in node-pg’s pool
- Subsequent queries fail immediately without sending any data with the following error:
error Error: Client has encountered a connection error and is not queryable at process.nextTick (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:500:25) at _combinedTickCallback (internal/process/next_tick.js:131:7) at process._tickDomainCallback (internal/process/next_tick.js:218:9)
Issue Analytics
- State:
- Created 4 years ago
- Comments:24 (19 by maintainers)
Top Results From Across the Web
node.js - when to disconnect and when to end a pg client or pool
Its quite simple, a client-connection (single connection) opens up, query with it, once you are done you end it. The pool concept is...
Read more >Connecting – node-postgres
Here's a tiny program connecting node.js to the PostgreSQL server: ... us reuse them to connect to different databases without having to modify...
Read more >node-mssql | Microsoft SQL Server client for Node.js
Default connection string when connecting to port: Driver={SQL Server Native Client 11.0};Server={#{server},#{port}};Database={#{database}};Uid={#{user}};Pwd={ ...
Read more >node-postgres - Bountysource
We believe the issue is due to the way Linux handles socket timeout events differently to macOS. Steps to reproduce:.
Read more >Nakama: TypeScript Runtime | Heroic Labs Documentation
TypeScript Runtime #. The game server embeds a JavaScript Virtual Machine (VM) which can be used to load and run custom logic specific...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The pool is not aware of the connection status. It’s up to the user to inform the pool that the connection is either still valid (via
client.release()
) or invalid (viaclient.release(err)
).From the docs:
Using
pool.query(...)
does this automatically. If you manage the connection yourself or provide your own wrapper aroundpool.connect()
then you need to ensure that errant connections are evicted viaclient.release(err)
.A “smarter” wrapper might differentiate between transient errors (ex: UNIQUE key violation) or permanent ones (ex: connection killed), but a simple default is to design applications with queries that do not fail and evict connections from the pool when they do.
@jmacmahon At this point I suggest that you follow it up here instead 😉