Prisma not applying limit when take specified with id cursor
See original GitHub issueBug description
I was noticing that when paginating through one of my larger tables with prisma, the first page was always loading quickly but subsequent ones were taking many seconds. My queries look like this:
function getPaginationArgs(cursor: string | undefined) {
return {
take: 10,
cursor: cursor ? { id: cursor } : undefined,
skip: cursor ? 1 : undefined,
orderBy: { createdAt: "desc" },
} as const;
}
const page1 = await prisma.page.findMany(getPaginationArgs(undefined));
const cursor = page1[page1.length - 1].id;
const page2 = await prisma.page.findMany(getPaginationArgs(cursor));
I went in and looked at the queries prisma was issuing:
First page:
SELECT "prisma_test_schema_1"."page"."id", "prisma_test_schema_1"."page"."created_at", "prisma_test_schema_1"."page"."url" FROM "prisma_test_schema_1"."page" WHERE 1=1 ORDER BY "prisma_test_schema_1"."page"."created_at" DESC LIMIT $1 OFFSET $2
Second Page:
SELECT "prisma_test_schema_1"."page"."id", "prisma_test_schema_1"."page"."created_at", "prisma_test_schema_1"."page"."url" FROM "prisma_test_schema_1"."page", (SELECT "prisma_test_schema_1"."page"."created_at" FROM "prisma_test_schema_1"."page" WHERE ("prisma_test_schema_1"."page"."id") = ($1)) AS "order_cmp" WHERE "prisma_test_schema_1"."page"."created_at" <= "order_cmp"."created_at" ORDER BY "prisma_test_schema_1"."page"."created_at" DESC OFFSET $2
The big difference here is that LIMIT
is present in the first query but not the second.
How to reproduce
https://github.com/TLadd/prisma-unique-composite-key-query-bug/blob/master/README.md Follow instructions in README for a reproducible example. Basically just the above code snippet.
Expected behavior
I would expect the LIMIT
to be applied to the second query as well.
Prisma information
generator client {
provider = "prisma-client-js"
binaryTargets = ["native", "debian-openssl-1.1.x"]
}
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
model Page {
id String @id @default(dbgenerated())
createdAt DateTime @default(now()) @map("created_at")
url String @unique
@@map("page")
}
Environment & setup
- OS: Mac OS
- Database: PostgreSQL
- Node.js version: v12.16.2
- Prisma version: 2.12.1
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Pagination (Reference) - Prisma
Cursor -based pagination uses cursor and take to return a limited set of results before or after a given cursor. A cursor bookmarks...
Read more >Prisma Client API (Reference)
cursor, UserWhereUniqueInput, No, Specifies the position for the list (the value typically specifies an id or another unique value). take, number, No ......
Read more >Raw database access (Reference) - Prisma
Learn how you can send raw SQL and MongoDB queries to your database using the raw() methods from the Prisma Client API.
Read more >Prisma Migrate limitations and known issues
Prisma Migrate generates SQL files that are specific to your provider. This means that you cannot use the same migration files for PostgreSQL...
Read more >Super-fast Offset Pagination with Prisma2 - Medium
Skip works like LIMIT statement of SQL and take works like OFFSET ... Based on this cursor system of Prisma, I could make...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Short version: This is by design and not a bug. We have to fall back to an inefficient query because the query orders by
createdAt
, which is not on a unique non-nullable (required) field. If you can, adding a required unique field to theorderBy
and you will see that the pagination becomes efficient again. Or use pure skip/take-based pagination without cursor, because cursors are hard in SQL.Long version: Cursor based pagination is hard in SQL and requires a few trade-offs for the abstraction that Prisma uses. Let me explain why. On the surface, one might think itās a fairly simple issue to solve - āa few > / >=, ā¦ here and there and you got itā. Unfortunately, this is not the case. The issues come with the interplay of different factors, such as linear vs non-linear data, if the ordering is āuniqueā, how many order-by clauses you have. Let me illustrate the thought processes and issues with a few examples. Any ideas how to solve it differently are welcome.
Note: Regardless of the above, the first query is always fast because we donāt have to pin a cursor. We can just apply the ordering and take the first x elements, resulting in a simple, efficient query.
Now we get to anything past the first page, we want to build a query that returns all the data that we want. A common thing for SQL is to use
CURSOR .. FETCH
. The tl;dr here is: The Prisma architecture and use cases donāt match this approach, additional to the complications of the transaction lifecycle that would be required to make this work. This leaves the option of building a query from scratch for Prisma - so letās do that.Sample data (assuming non-linear random IDs for worst-case-ish illustration), ordered by
colA ASC
:Assume we have a cursor at
rng4
withORDER BY colA ASC
after the first page. Weād need a query that fetchesrng4
,rng5
andrng6
(Prisma returns the cursor by convention, but not really relevant for the complexity of the issue).A naive first take like
WHERE id >= "rng4" ORDER BY colA ASC LIMIT <PageSize>
doesnāt work, because we canāt assume thatid
contains linear data, meaning thereās actually no way to useid
in the comparison in a meaningful way. For all we know,rng1
could come afterrng4
, lexicographically.This means we have to rely on the rest of the row the cursor points to to determine a way to fetch the next records. We have
colA = B
andcolB = B
. We only have useful information aboutcolA
because itās in the ordering and it states it is ascending, so we can use that info to fetch rows after the cursor. For that matter, we need to fetch the value ofcolA
of the row with cursorrng4
in a subquery so we can use it in the rest of our query (cursors always pin exactly one record):Based on
colA ASC
we can make an assertion that the records followingrng4
have to havecolA >= subquery.colA
(which is āBā in this example). A glaring issue with this query is that it still doesnāt guarantee that returned records come after the specified cursor, because as seen in the example,rng3
also hascolA = B
, meaning the result set is nowrng3 - 6
. To make things worse, we canāt apply a limit now because we donāt know how many records with identicalcolA
value we have that come before the cursor, so we canāt do something likeLIMIT <user specified take> + <num identical order rows before cursor>
.Note 1: The end result is even more complicated (not all edge cases like multiple order bys or NULL values in the cursor row / order have been taken into account for brevity), but the above should suffice to illustrate the core issue.
Note 2: You can maybe come up with some fancy SQL to get a good heuristic going to reduce the fetch size to below all records after cursor, we didnāt look into that yet.
This is where it actually stops - we canāt achieve 100% accuracy here. All we can do is a best-effort to fetch a superset of the rows we want, which is admittedly bad because we effectively fetch all rows after and sourrounding the cursor and post-process those in the query engine to reduce the record set to exactly the page the query requested.
However, on the bright side, all of the above is basically solved with one thing: a non-nullable unique field in the orderBy. Any required unique field will do (or combination of required fields as specified in
@@unique
). The entire issue of records with identical order row (colA
) vanishes because we know that it can only occur once, so we can simple saycolA >= <unique value>
and we can apply limits again directly in the database.I can reproduce this. Thanks for the reproduction.