CockroachDB: Highly Variable Query Response Times
See original GitHub issueBug description
I am using Prisma to connect to a hosted Dedicated CockroachDB instance via the PostgreSQL interface. A number of my SELECT queries are running very slow in production (e.g. ~2-3s) on the first run or two, but then quickly falls to <10ms on subsequent query calls. If I wait a while, the queries become slow again. The complexity of the query and size of the data returned is small (~5 rows and 10 columns) and the issues are present for a variety of queries. I’ve confirmed this is likely not related to the Cockroach instance as I’ve compared issuing the same raw query via Prisma vs node-postgres to the same database and node-postgres reliably returns results in <10ms.
The symptoms seem like they might be related to some queries starting a new connection and pooling, but I’ve not been successful in fixing this. I explicitly call $connect
before issuing any of the queries and I’ve experimented with changing the connection pool limit from 1 to 12 which no change in behavior.
How to reproduce
- Connect to my database via $connect
- Issue a raw SQL query like
SELECT main.prices.id, main.prices.merchant_id, main.prices.tiers_mode, main.prices.user_id, main.prices.updated_at, main.prices.created_at, main.prices.tiers, main.prices.crdb_region, main.prices.user_country_code FROM main.prices WHERE main.prices.id IN ('price_01FBJ41K2911YCF4NY7KHK81QS','price_01FBQ7SH0ZEJBKBET0TYK5WJPH','price_01FBQ7TSPRND2MPGWT1HVK5PN3','price_01FF3V48DKY82AX5J2EDSEP590','price_01FHWZX6XAG2RGFJ4QBA232CWE') OFFSET 0
.
Note: The same thing happens when issuing a normal Prisma query.
- Measure the time to return.
The first few runs take ~1-3 seconds, but subsequent runs take <10ms. If I wait a bit, the queries again are slow. I’ve compared the same queries using the node-postgres (https://node-postgres.com/) against the same database on the same server and don’t see the same variability in latencies.
Expected behavior
I expect the latencies for these simple queries to be on par with node-postgres and be consistently <10ms.
Prisma information
generator client {
provider = "prisma-client-js"
binaryTargets = ["native", "linux-arm64-openssl-1.1.x"]
}
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
model income_estimates {
id String
user_id String
created_at DateTime @default(now()) @db.Timestamptz(6)
updated_at DateTime @default(now()) @db.Timestamptz(6)
estimate_object Json
deduplicating_id String
income_sources Json
users users @relation(fields: [crdb_region, user_id], references: [crdb_region, id], onDelete: Cascade, map: "income_estimates_users_id_fk")
discounted_catalogs discounted_catalogs[]
discounted_plans discounted_plans[]
crdb_region crdb_internal_region
user_country_code String
etc.
Environment & setup
- Google Cloud Kubernetes (Debian)
- CockroachDB via PostgreSQL interface
- v16.13.1
Prisma Version
prisma : 3.8.0
@prisma/client : 3.8.0
Current platform : debian-openssl-1.1.x
Query Engine (Node-API) : libquery-engine 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/libquery_engine-debian-openssl-1.1.x.so.node)
Migration Engine : migration-engine-cli 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/migration-engine-debian-openssl-1.1.x)
Introspection Engine : introspection-core 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/introspection-engine-debian-openssl-1.1.x)
Format Binary : prisma-fmt 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/prisma-fmt-debian-openssl-1.1.x)
Default Engines Hash : 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f
Studio : 0.452.0
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:38 (13 by maintainers)
Top GitHub Comments
Just sharing more background info that I found.
I set up Wireshark and captured the traffic between Prisma and CRDB while running the example that @ppoddar-affordably shared.
yarn start:dev
curl
command did not cause a new connection to CRDB to be created.curl
command, Prisma sent a few extra queries before executing theSELECT * FROM test.product_costs;
query:SELECT t.typname, t.typtype, t.typelem, r.rngsubtype, t.typbasetype, n.nspname, t.typrelid FROM pg_catalog.pg_type t LEFT OUTER JOIN pg_catalog.pg_range r ON r.rngtypid = t.oid INNER JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid WHERE t.oid = $1
SELECT enumlabel FROM pg_catalog.pg_enum WHERE enumtypid = $1 ORDER BY enumsortorder
$1
parameter is100106
– this is the type ID for thetiers_mode
enum.curl
command was 29ms.curl
command, Prisma sent aSELECT 1
query before running theproduct_costs
query.curl
command was 3ms.These pg_catalog queries definitely add to latency, and would be worse in a multiregion cluster.
I read https://github.com/prisma/prisma/issues/2921#issuecomment-662423290 which also discusses these enum queries, and it seems like they are executed if the user-initiated query uses a user-defined type and the client does not already have the type information cached. These enum queries are coming from the
tokio-postgres
Rust driver. See https://github.com/sfackler/rust-postgres/blob/3e4be865318ddd4a6b4493d689703db32ca3d184/tokio-postgres/src/prepare.rs#L19I just repro’d on my global cluster. It seems like when I select any column that is not a user-defined enum, the first query returns quickly (<10ms) but once I specify
*
or the user-defined enum e.g.tiers_mode
, the first-query latency spikes to 1.7s and then comes down so it seems like @rafiss finding is almost definitely the culprit. Appreciate the deep dive here and curious for your thoughts on the best way to proceed from here and if there is any “fix” for this.