question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"timeout exceeded when trying to connect" spike after upgrading to 8.2.1

See original GitHub issue

Yesterday we upgraded from pg@8.0.3 (with pg-pool@3.1.1) to pg@8.2.1 (with pg-pool@3.2.1), specifically from our package-lock.json:

    "pg": {
      "version": "8.2.1",
      "resolved": "https://registry.npmjs.org/pg/-/pg-8.2.1.tgz",
      "integrity": "sha512-DKzffhpkWRr9jx7vKxA+ur79KG+SKw+PdjMb1IRhMiKI9zqYUGczwFprqy+5Veh/DCcFs1Y6V8lRLN5I1DlleQ==",
      "requires": {
        "buffer-writer": "2.0.0",
        "packet-reader": "1.0.0",
        "pg-connection-string": "^2.2.3",
        "pg-pool": "^3.2.1",
        "pg-protocol": "^1.2.4",
        "pg-types": "^2.1.0",
        "pgpass": "1.x",
        "semver": "4.3.2"
      },
      "dependencies": {
        "pg-connection-string": {
          "version": "2.2.3",
          "resolved": "https://registry.npmjs.org/pg-connection-string/-/pg-connection-string-2.2.3.tgz",
          "integrity": "sha512-I/KCSQGmOrZx6sMHXkOs2MjddrYcqpza3Dtsy0AjIgBr/bZiPJRK9WhABXN1Uy1UDazRbi9gZEzO2sAhL5EqiQ=="
        },
        "semver": {
          "version": "4.3.2",
          "resolved": "https://registry.npmjs.org/semver/-/semver-4.3.2.tgz",
          "integrity": "sha1-x6BxWKgL7dBSNVt3DYLWZA+AO+c="
        }
      }
    }

We started seeing a spike in timeout exceeded when trying to connect errors with this stacktrace:

Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:188:27)
    at listOnTimeout (internal/timers.js:549:17)
    at processTimers (internal/timers.js:492:7)

This is a pretty basic express app with a postgres 12 backend running on node 12.

We report metrics on the connection pool max/total/idle/waiting count values and there is an obvious spike in the wait count from the time the 8.2.1 upgrade was deployed (around 9am CT yesterday) and then the drop when we reverted that change (about 6am CT today):

image

That corresponds with our API request/response/error rates (again, just a simple express app over a pg db):

image

We’re not sure how to debug this. These are the relevant values we’re using related to the Pool config:

  • connectionTimeoutMillis = 60000
  • idleTimeoutMillis = 60000
  • max = 150

We have a staging environment where this showed up as well but we didn’t have an alert setup for it (we do now). So if there is something we can do to help debug this and provide information back we can probably do that in our staging environment.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:36 (22 by maintainers)

github_iconTop GitHub Comments

1reaction
heitordobeiscommented, Feb 14, 2022

hi @mriedem, i have the same problem today, did you get any solution?

1reaction
mriedemcommented, Jul 9, 2020

okay - and I’m guessing it happens some what “at random” meaning your app has been running for a while & then you get a timeout? Once you get a timeout do you get a bunch or is it just one every once and a while?

I would have to dig into this. In our production cluster we have the app running in 20 replicas and each has a pool configured for 150 connections. Our readiness probe is set to hit an API which does a select now(); query to make sure that pod’s connection pool is OK because if it’s full then we want that pod to go out of rotation for traffic until it can drain its connection requests. The pod will only crash and restart automatically if an uncaught error slips through.

I think I can say when it hits we get a bunch of timeouts which would probably explain why the waiting count per pod (in the graph in the issue description) spikes, because presumably something is blocking in the pool and so other requests are waiting until a timeout occurs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

"timeout exceeded when trying to connect" spike after ...
Developers - "timeout exceeded when trying to connect" spike after upgrading to 8.2. 1 -
Read more >
Node-Postgres Error: timeout exceeded when trying to connect
We have tried to debug the same but there is no load on the DB while this happens - no cpu spike, no...
Read more >
Major performance issues with bytea performance · Issue #2240 ...
I believe the fix introduced back then in brianc/node-packet-reader#3 ... "timeout exceeded when trying to connect" spike after upgrading to 8.2.1 #2262.
Read more >
ConnectionError: timeout exceeded when trying to connect, at ...
We started getting frequently below timeout error after upgrading ODK central version from 1.2.1 to 1.4.2. {"message":"Completely unhandled ...
Read more >
Fix common cluster issues | Elasticsearch Guide [8.5] | Elastic
This error indicates a data node is critically low on disk space and has reached the flood-stage disk usage watermark. Circuit breaker errors:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found