Handling connection timeouts / dropped network / db failover
See original GitHub issueHello,
I’m working on setting up a health check within our system, and have noticed that our Postgres implementation doesn’t seem to handle connection timeouts, dropped network, or a db failover event very gracefully (at least in the way it works when using a multi-az deployment in Amazon RDS). The issue is very similar to one that was fixed in the mysql package a while back – https://github.com/mysqljs/mysql/issues/821.
We effectively have a class that looks similar to this:
const Promise = require('bluebird');
const pg = require('pg');
class PostgresSQLInterface extends Service {
constructor(dbConnectionOptions, logger) {
const defaults = {
host: 'localhost',
port: '5432',
user: '',
password: null,
max: 10, // Poolsize
min: 10,
keepAlive: true,
idleTimeoutMillis: 10000
};
this.state = {
config: _.merge({}, defaults, dbConnectionOptions)
};
this.createDatabasePool();
}
createDatabasePool() {
this.state.pool = new pg.Pool(this.state.config);
this.state.pool.on('connect', () => {
this.state.healthy = true;
console.log('HEALTHY');
});
this.state.pool.on('error', () => {
this.state.healthy = false;
console.log('UNHEALTHY');
});
}
*getDatabaseConnection() {
try {
const connection = yield this.state.pool.connect();
return connection;
}
catch (error) {
throw new Errors.DatabaseError("Could not connect to Postgres", {
originalError: error
});
}
}
};
In a normal environment, when our application connects, I see it spit out “HEALTHY” x the number of connections it made, as expected. However, there are a few issues:
1.) If the connection is severed (I turn off my wireless, kill my VPN, or trigger a reboot on the RDS instance), no error events are raised even though the documentation for pg-pool states an error is “Emitted whenever an idle client in the pool encounters an error. This is common when your PostgreSQL server shuts down, reboots, or a network partition otherwise causes it to become unavailable while your pool has connected clients.” My expectation would be to see a string of “UNHEALTHY” statements.
2.) Similarly, if I initiate a failover in RDS, no error events are raised but I am unable to run queries. This may be due to how AWS handles the DNS entries around multi-az deployments, but I cannot test this while issue #1 remains unresolved.
3.) Calling getDatabaseConnection when no replies are received from postgres (sudo iptables -A INPUT -p tcp --source-port 5432 -j DROP, sudo iptables -F afterwards to restore) hangs on grabbing a connection, as there is no timeout setting.
Am I missing a setting in my configuration, or is this a situation that just hasn’t come up yet for someone else? If it’s the latter, I’d be more than happy to create a fix for the issue, but want to make sure I’m not overlooking something first.
Node version: 6.2.x PG version: 6.0.2 Postgres version: 9.5.x
Thanks!
Issue Analytics
- State:
- Created 7 years ago
- Comments:33 (14 by maintainers)
Top GitHub Comments
I think I’ve found the fix for this. When the backend server disconnects or socket is closed unexpectedly the client would emit an
end
event, but not anerror
event. This resulted in silent disconnections, pools full of disconnected clients. inability to handle db failover, etc. I’ve put a patch out atpg@6.2.4
- please check it out & lemme know if it fixes your issue!Hey, sorry for reviving such an old issue but, from my own testing attempting to recover from Multi-AZ RDS failover scenarios (which I’m simulating by issuing a
docker pause
on a db instance and then updating a DNS record to point to another) I would think this is actually not solved. I believe (I’m currently attempting to get a confirmation from AWS) what @jessedpate claims here is right, which would mean we actually need a way to hook in a timeout at the socket level.Our current workaround amounts to:
added to
lib/connection.js
.This is actually possible with the current implementation, since the library allows passing an already existing socket object in the
Client
configuration, but some libraries (knex) don’t actually have hook-points available to modify the creation of the connection (at least from what I could see), and had us jumping around some loops to achieve the same behaviour (afterCreate
on tarn’s config).All in all, I think it’d be worthwhile to actually allow for configuration of the property, since as mentioned above, it’s available on other drivers.