[optimisation] select only required columns on sql
See original GitHub issueHi,
I found a behavior that can be possibly easily optimized.
Let’s say I have a table named article
and it has 100 columns including id
. I wanted to query id
of every articles, so I requested like this.
query {
article {
id
}
}
And the executed query is like this.
SELECT
coalesce(json_agg("root"), '[]') AS "root"
FROM
(
SELECT
row_to_json(
(
SELECT
"_1_e"
FROM
(
SELECT
"_0_root.base"."id" AS "id"
) AS "_1_e"
)
) AS "root"
FROM
(
SELECT
*
FROM
"public"."article"
WHERE
('true')
) AS "_0_root.base"
) AS "_2_root"
The problem is that it selects every columns *
.
SELECT
*
FROM
"public"."article"
And it actually could and should select columns only requested from a client, like this.
SELECT
id
FROM
"public"."article"
This might be a quite considerable performance problem in some cases. For example, my table article
has 100 columns. So, the simple query only searching for one column id
actually results 100 times higher IO, memory usage, and time on the database than expected.
If resolving this is not as easy as users might think, then let me know how I can help, in a concrete way, so that I might make a PR if my time is applicable. Or If I misunderstood about how postgres would work with the sql, I’d be thankful if letting me know it as well!
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (7 by maintainers)
@jjangga0214 thank you for clarifying. I just edited my previous comment and now I’ve seen that you clarified. Sorry for the confusion 🙏
Yes, we’ll definitely add it to the documentation. Thank you again for your input!
@marionschleifer Uh… no. Thank you for the response, but you misunderstood what I meant.
This means, literally, many people write resolvers that selects columns even not requested from client. For example, when there’s a query
articles
, which selects every articles, then resolvers would look like this.A resolver chain is a tree, which seperates field resolvers from sql. This typically results in selecting every columns. People do know its inefficiency. However, the tree provides an easy to write structure, and people don’t care about micro optimization till needed. But this can become a problem when working with a table with lots of (more than a hundred) columns.
So, if people with the problem understand hasura guarentees selecting only requested columns, then they might be able to easily decide to use hasura.
That’s why I suggested explicitly mentioning this fact(“hasura only select columns requested from graphql query, as it’s a graphql compiler to sql.”) would be good. Currently, as far as I know, the docs only say hasura is a realtime(if statement is not “prepared” yet) graphql compiler to sql.