"Client was closed and is not queryable" when using client from pool in transactions
See original GitHub issueRan into a very puzzling bug yesterday when trying to implement database transactions, related to but not the same issue as #1892.
Node version: 12.18.3 Pg version: 8.5.1 OS: Mac & Linux (replicated both locally and on CI)
I was using pool clients in order to create long-running transactions, and found that sometimes when I went to rollback a commit, the database client had sometimes (but inconsistently) been released back to the pool. We were using the async/await syntax for clients, example pseudocode below:
async function transaction() {
const client = await pool.connect();
try {
await client.query('BEGIN;');
await client.query('some insert query')
const result = await client.query('some query');
if (!result.isValid) {
// jump to catch
throw new Error('invalid result');
}
await client.query('COMMIT;');
return result;
} catch (err) {
try {
// error here
await client.query('ROLLBACK;');
} catch (e) {
console.log('could not rollback: ', e);
}
throw err;
} finally {
client.release();
}
}
When running our tests, at higher loads we would get the following error:
could not rollback: Error: Client was closed and is not queryable
The client wasn’t being released by the client.release()
call, since that always happens after any rollbacks, so I added some logging statements, and it looks like the Pool
class (from pg-pool) is logging this.log('remove idle client')
right before these error in the rollback happens. We found that setting the idleTimeoutMillis: 0
fixed the issue since it prevented the idle client timeout from ever being created.
The oddest part of this is that the clients didn’t need to be idle for longer than the default of 10s for them to be closed - we logged the time at which the client was removed from the pool, and the time at which the failed rollback was attempted, and it was often less than a second.
Is there some reason why the clients are being closed earlier than the timeout? Perhaps the timeouts from earlier instances of the clients aren’t being cleared properly?
Any insight would be greatly appreciated, we’d love to be able to use the transactions in prod, but due to this issue they don’t seem to scale very well.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
I’ll try to find the time this weekend to put a MWE together!
Having a similar issue, replicating as self-contained script/repo is proving impossible - issue is only appearing for me intermittently when making a lot of queries from a hard hitting test inside a Docker with https://sailsjs.com/ - so can’t rule out Docker/Sails funny stuff. Some times as much as 80% of the time, but then run the same tests on “bare metal” and there’s no problems.
It appears that the setTimeout in https://github.com/brianc/node-postgres/blob/c99fb2c127ddf8d712500db2c7b9a5491a178655/packages/pg-pool/index.js#L352-L358 is killing a client that has finished it’s original query and then the same client has been reused for another query which then fails when it is killed mid-execution. Looking through code though, that does not make sense, but logging at the Postgres logs I see no queries hanging or any taking anywhere near 10,000ms to run.
Ended up on this issue because this issue is the closest to what I was seeing - queries failing with
remove idle client
- and making the below note for any future explorers who land here:Setting
idleTimeoutMillis: 0
mitigated the issue allowing the test to run.