Extremely slow subscriptions (seq scan instead of index)
See original GitHub issuesubscription liveMatches {
matches(limit: 5, where: {status: {_eq: 3}}) {
id
}
}
I’m seeing responses on this query of anywhere from 1000ms to 3500ms, whereas the simpler version below returns in < 20ms. The table has an index on status. The table has a little over 1,000,000 rows, but this query has at most 5-100 nodes. I’ve noticed if the limit is above the total number of total nodes, it gets particularly slow. If the limit is beneath the number of nodes it’s quite fast. I’m not sure if that relevance, but it may be helpful data.
Nested Loop Left Join (cost=0.56..0.60 rows=1 width=48)
-> Function Scan on _subs (cost=0.01..0.01 rows=1 width=48)
-> Subquery Scan on matches (cost=0.55..0.58 rows=1 width=32)
-> Aggregate (cost=0.55..0.56 rows=1 width=32)
-> Limit (cost=0.00..0.49 rows=5 width=32)
-> Seq Scan on matches matches_1 (cost=0.00..53376.58 rows=545002 width=32)
Filter: (status = ((_subs.result_vars #>> '{synthetic,0}'::text[]))::integer)
SubPlan 1
-> Result (cost=0.00..0.01 rows=1 width=32)
It appears as though it’s using a seq scan for what should just be a simple index scan:
You can see once this query starts, it alone hammers my database:
I sent this out on Discord but didn’t get a reply after 24h so I figured I’d share it here. Thanks.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (2 by maintainers)
Top Results From Across the Web
Postgres prefers slow Seq Scan over fast Index Scan
Postgres estimates to get 1393 rows from the index scan on run_books_index , but actually only finds 15. That's off by a factor...
Read more >Postgres chooses much slower Seq Scan instead of Index Scan
In the case of the first query when you're only selecting foos.bar_id the executor is able to fulfill this with an index only...
Read more >5mins of Postgres E42: A surprising case of very large ...
There are cases where Postgres decides to not use an index, but instead opts for a Sequential Scan, causing performance problems.
Read more >I have an index and my query is still slow - wasteman.codes
The sequential scan loops through all records in your database and checks the query condition on each individual record. Generally this is the ......
Read more >Understanding EXPLAIN plans - GitLab Docs
Sequential scans can be quite slow when retrieving lots of rows, so it's best to avoid these for large tables. Index Only Scan....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

This is a tricky issue. After quite some investigation, I’ve managed to create a much simpler example that highlights what’s going wrong in the Postgres planning process.
The root problem
To start, we’ll create a table with a large number of rows (in this case 1,000,000) where all of those rows are “boring” (in this case that means
status = 1):Now let’s add a much smaller number of “interesting” rows to the end of the table:
After running
VACUUM ANALYZEto ensure statistics are up to date, we can generate a query plan for this rather simple query:This fetches all the rows where
statusis2or3, which only appear in the small block of 1,000 “interesting” rows at the very end of the table. As expected, Postgres will consult the index for this query:But what happens if we add an innocuous
LIMIT 5to the end of this query? We wouldn’t expect that to affect the query plan significantly, but in fact, it does. Specifically, it switches from an index scan to a full table scan!This is a really bad idea, since this means the inner query will have to scan through the 1,000,000 “boring” rows just to find 5 of the “interesting” rows at the end of the table. This strange query planning behavior is what is making these subscription queries so slow.
Why does this happen?
To understand why the Postgres query planner makes this decision, we can inspect the query plan that gets generated if we temporarily disable full-table scans altogether using
SET enable_seqscan TO off, forcing the planner to use an index scan:If you compare the estimated cost of this query to the estimated cost of the version using the full-table scan, you’ll notice that the full-table scan plan appears strictly better: its estimated cost bounds are
0.05..0.32versus0.43..0.68. Obviously, this estimate is bogus: in practice, the index scan is enormously faster. So then where does this low cost estimate come from?The issue has to do with an assumption that a limit simply linearly scales the cost of the limited query. That is, the expectation is that if a
LIMITclause only consumes 1% of the result set, then only 1% of the work will have been done to produce the entire result set. However, this is only true if the filtered rows are uniformly distributed throughout the result set. If they are non-uniformly distributed, as is the case here, it could be dramatically more efficient to perform the filter by consulting the index, since that will have a consistent cost regardless of the distribution of the rows.Since consulting the index technically has a nonzero constant overhead (since rows from the table itself need to be fetched to return the
idcolumn), after the cost is scaled byLIMIT, the full-table scan looks “cheaper.” Again, this is obviously disastrous in this situation, but it’s what Postgres decides to do. Some discussion about why Postgres continues to use this heuristic despite it potentially being so flawed is given in this mailing list thread.What can be done about it?
Now for the tough question: what can be done to mitigate this issue? In general, there’s no obvious easy solution.
An extreme response would be to revert #2942 altogether and avoid parameterizing over values that don’t correspond to GraphQL query variables of scalar type. But this is pretty unsatisfying, for reasons given in the PR description of #2942: it means two semantically-identical GraphQL queries can have significant differences in performance, in a way that is pretty non-obvious to users.
A better solution would probably be to do something smarter: only parameterize over SQL values if those values came from a GraphQL variable, even if they’re nested inside a larger GraphQL value. That means we’d still parameterize over the
album_idin the example given in #2942 (because it came from a GraphQL variable), but we wouldn’t parameterize over thestatusvalue in the original comment in this issue (because it’s a hardcoded GraphQL literal).That approach would still give different operational meaning to semantically-equivalent GraphQL queries, but in a much more intuitive way that better reflects the intent. It makes sense that we’d parameterize over things the user parameterized over, and it also makes sense we’d hardcode values the user hardcoded. We’re unlikely to get value out of parameterizing over hardcoded values because they’re unlikely to ever change, and passing them into the SQL exposes more optimization opportunities to the Postgres query planner.
However, implementing this is a little tricky, because we’d have to somehow track where values came from after parsing the GraphQL query such that we can parameterize over them later. That’s doable, of course, but there are thorny edge cases, and it might be subtle to get right.
First off, thanks so much for the detailed look into this. That’s a remarkably fascinating breakdown, and I appreciate you going through the effort.
I don’t anticipate you reverting 2942, nor do I expect it. The solution to only parametrize gql variables makes sense to me; but doesn’t ultimately solve the problem, right? It solves my use-case, but someone else could come along with the same problem arising out of gql variables. So while that’s helpful to me, I could see that not being helpful to everyone. I’m in favor of it, if the Hasura team is willing to pursue it, of course.
I wish I could pass a header or something to the query to bypass the parameterization, but since this is a subscription over websockets, I can’t even do that. Maybe there’s another flag I can pass via the query or POST body though.
All in all, I appreciate you looking into this and hope we can find a solution that works. If there’s anything I can do on my end to help, please let me know.