Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CockroachDB: Highly Variable Query Response Times

See original GitHub issue

Bug description

I am using Prisma to connect to a hosted Dedicated CockroachDB instance via the PostgreSQL interface. A number of my SELECT queries are running very slow in production (e.g. ~2-3s) on the first run or two, but then quickly falls to <10ms on subsequent query calls. If I wait a while, the queries become slow again. The complexity of the query and size of the data returned is small (~5 rows and 10 columns) and the issues are present for a variety of queries. I’ve confirmed this is likely not related to the Cockroach instance as I’ve compared issuing the same raw query via Prisma vs node-postgres to the same database and node-postgres reliably returns results in <10ms.

The symptoms seem like they might be related to some queries starting a new connection and pooling, but I’ve not been successful in fixing this. I explicitly call $connect before issuing any of the queries and I’ve experimented with changing the connection pool limit from 1 to 12 which no change in behavior.

How to reproduce

Connect to my database via $connect
Issue a raw SQL query like SELECT main.prices.id, main.prices.merchant_id, main.prices.tiers_mode, main.prices.user_id, main.prices.updated_at, main.prices.created_at, main.prices.tiers, main.prices.crdb_region, main.prices.user_country_code FROM main.prices WHERE main.prices.id IN ('price_01FBJ41K2911YCF4NY7KHK81QS','price_01FBQ7SH0ZEJBKBET0TYK5WJPH','price_01FBQ7TSPRND2MPGWT1HVK5PN3','price_01FF3V48DKY82AX5J2EDSEP590','price_01FHWZX6XAG2RGFJ4QBA232CWE') OFFSET 0.

Note: The same thing happens when issuing a normal Prisma query.

Measure the time to return.

The first few runs take ~1-3 seconds, but subsequent runs take <10ms. If I wait a bit, the queries again are slow. I’ve compared the same queries using the node-postgres (https://node-postgres.com/) against the same database on the same server and don’t see the same variability in latencies.

Expected behavior

I expect the latencies for these simple queries to be on par with node-postgres and be consistently <10ms.

Prisma information

generator client {
  provider      = "prisma-client-js"
  binaryTargets = ["native", "linux-arm64-openssl-1.1.x"]
}

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

model income_estimates {
  id                  String
  user_id             String
  created_at          DateTime              @default(now()) @db.Timestamptz(6)
  updated_at          DateTime              @default(now()) @db.Timestamptz(6)
  estimate_object     Json
  deduplicating_id    String
  income_sources      Json
  users               users                 @relation(fields: [crdb_region, user_id], references: [crdb_region, id], onDelete: Cascade, map: "income_estimates_users_id_fk")
  discounted_catalogs discounted_catalogs[]
  discounted_plans    discounted_plans[]
  crdb_region         crdb_internal_region
  user_country_code   String

etc.

Environment & setup

Google Cloud Kubernetes (Debian)
CockroachDB via PostgreSQL interface
v16.13.1

Prisma Version

prisma                  : 3.8.0
@prisma/client          : 3.8.0
Current platform        : debian-openssl-1.1.x
Query Engine (Node-API) : libquery-engine 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/libquery_engine-debian-openssl-1.1.x.so.node)
Migration Engine        : migration-engine-cli 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/migration-engine-debian-openssl-1.1.x)
Introspection Engine    : introspection-core 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/introspection-engine-debian-openssl-1.1.x)
Format Binary           : prisma-fmt 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/prisma-fmt-debian-openssl-1.1.x)
Default Engines Hash    : 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f
Studio                  : 0.452.0

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:38 (13 by maintainers)

Top GitHub Comments

5reactions

rafisscommented, Mar 11, 2022

Just sharing more background info that I found.

I set up Wireshark and captured the traffic between Prisma and CRDB while running the example that @ppoddar-affordably shared.

The connection to CRDB was set up when I ran yarn start:dev
Running the curl command did not cause a new connection to CRDB to be created.
The first time I ran the curl command, Prisma sent a few extra queries before executing the SELECT * FROM test.product_costs; query:
- SELECT t.typname, t.typtype, t.typelem, r.rngsubtype, t.typbasetype, n.nspname, t.typrelid FROM pg_catalog.pg_type t LEFT OUTER JOIN pg_catalog.pg_range r ON r.rngtypid = t.oid INNER JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid WHERE t.oid = $1
- SELECT enumlabel FROM pg_catalog.pg_enum WHERE enumtypid = $1 ORDER BY enumsortorder
- The value of the $1 parameter is 100106 – this is the type ID for the tiers_mode enum.
- The latency of the curl command was 29ms.
The second time I ran the curl command, Prisma sent a SELECT 1 query before running the product_costs query.
- This is probably a connection pool healthcheck.
- The latency of the curl command was 3ms.
The third time was the same as the second, except the latency was 11ms.

These pg_catalog queries definitely add to latency, and would be worse in a multiregion cluster.

I read https://github.com/prisma/prisma/issues/2921#issuecomment-662423290 which also discusses these enum queries, and it seems like they are executed if the user-initiated query uses a user-defined type and the client does not already have the type information cached. These enum queries are coming from the tokio-postgres Rust driver. See https://github.com/sfackler/rust-postgres/blob/3e4be865318ddd4a6b4493d689703db32ca3d184/tokio-postgres/src/prepare.rs#L19

3reactions

ppoddar-affordablycommented, Mar 14, 2022

I just repro’d on my global cluster. It seems like when I select any column that is not a user-defined enum, the first query returns quickly (<10ms) but once I specify * or the user-defined enum e.g. tiers_mode, the first-query latency spikes to 1.7s and then comes down so it seems like @rafiss finding is almost definitely the culprit. Appreciate the deep dive here and curious for your thoughts on the best way to proceed from here and if there is any “fix” for this.

Top Results From Across the Web

Benchmarking Overview | CockroachDB Docs

CockroachDB delivers predictable throughput and latency at all scales on commodity hardware. This page provides an overview of the performance profiles you ...

Troubleshoot Statement Behavior | CockroachDB Docs

You can use query plans to troubleshoot slow queries by indicating where time is being spent, how long a processor (i.e., a component...

CockroachDB FAQs

CockroachDB is well suited for applications that require reliable, available, and correct data, and millisecond response times, regardless of scale.

Common Issues to Monitor | CockroachDB Docs

Service latency, The time between when the cluster receives a query and ... Degradation in SQL response time is the most common symptom...

SQL Dashboard | CockroachDB Docs

Service Latency: SQL, 99th percentile ... Service latency is calculated as the time in nanoseconds between when the cluster receives a query and...