Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[optimisation] select only required columns on sql

See original GitHub issue

Hi,

I found a behavior that can be possibly easily optimized.

Let’s say I have a table named article and it has 100 columns including id. I wanted to query id of every articles, so I requested like this.

query {
  article {
    id
  }
}

And the executed query is like this.

SELECT
  coalesce(json_agg("root"), '[]') AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_1_e"
          FROM
            (
              SELECT
                "_0_root.base"."id" AS "id"
            ) AS "_1_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."article"
        WHERE
          ('true')
      ) AS "_0_root.base"
  ) AS "_2_root"

The problem is that it selects every columns *.

SELECT
  *
FROM 
  "public"."article"

And it actually could and should select columns only requested from a client, like this.

SELECT
  id
FROM 
  "public"."article"

This might be a quite considerable performance problem in some cases. For example, my table article has 100 columns. So, the simple query only searching for one column id actually results 100 times higher IO, memory usage, and time on the database than expected.

If resolving this is not as easy as users might think, then let me know how I can help, in a concrete way, so that I might make a PR if my time is applicable. Or If I misunderstood about how postgres would work with the sql, I’d be thankful if letting me know it as well!

Issue Analytics

State:
Created 4 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

marionschleifercommented, Aug 26, 2019

@jjangga0214 thank you for clarifying. I just edited my previous comment and now I’ve seen that you clarified. Sorry for the confusion 🙏

Yes, we’ll definitely add it to the documentation. Thank you again for your input!

0reactions

jjangga0214commented, Aug 26, 2019

@marionschleifer Uh… no. Thank you for the response, but you misunderstood what I meant.

For example, some manually written graphql APIs do select unnecessary columns

This means, literally, many people write resolvers that selects columns even not requested from client. For example, when there’s a query articles, which selects every articles, then resolvers would look like this.

const resolvers = {
  Query: {
    articles: () => sql('SELECT * from article')
  },
  Article: {
    id: (parent) => parent.id,
    title: (parent) => parent.title,
    content: (parent) => parent.content,
  }
}

A resolver chain is a tree, which seperates field resolvers from sql. This typically results in selecting every columns. People do know its inefficiency. However, the tree provides an easy to write structure, and people don’t care about micro optimization till needed. But this can become a problem when working with a table with lots of (more than a hundred) columns.

and if users know hasura solves the issue, it could be helpful for them to decide using hasura.

So, if people with the problem understand hasura guarentees selecting only requested columns, then they might be able to easily decide to use hasura.

What if docs mentions about this for making users feel certain?

That’s why I suggested explicitly mentioning this fact(“hasura only select columns requested from graphql query, as it’s a graphql compiler to sql.”) would be good. Currently, as far as I know, the docs only say hasura is a realtime(if statement is not “prepared” yet) graphql compiler to sql.