question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[optimisation] select only required columns on sql

See original GitHub issue

Hi,

I found a behavior that can be possibly easily optimized.

Let’s say I have a table named article and it has 100 columns including id. I wanted to query id of every articles, so I requested like this.

query {
  article {
    id
  }
}

And the executed query is like this.

SELECT
  coalesce(json_agg("root"), '[]') AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_1_e"
          FROM
            (
              SELECT
                "_0_root.base"."id" AS "id"
            ) AS "_1_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."article"
        WHERE
          ('true')
      ) AS "_0_root.base"
  ) AS "_2_root"

The problem is that it selects every columns *.

SELECT
  *
FROM 
  "public"."article"

And it actually could and should select columns only requested from a client, like this.

SELECT
  id
FROM 
  "public"."article"

This might be a quite considerable performance problem in some cases. For example, my table article has 100 columns. So, the simple query only searching for one column id actually results 100 times higher IO, memory usage, and time on the database than expected.

If resolving this is not as easy as users might think, then let me know how I can help, in a concrete way, so that I might make a PR if my time is applicable. Or If I misunderstood about how postgres would work with the sql, I’d be thankful if letting me know it as well!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
marionschleifercommented, Aug 26, 2019

@jjangga0214 thank you for clarifying. I just edited my previous comment and now I’ve seen that you clarified. Sorry for the confusion 🙏

Yes, we’ll definitely add it to the documentation. Thank you again for your input!

0reactions
jjangga0214commented, Aug 26, 2019

@marionschleifer Uh… no. Thank you for the response, but you misunderstood what I meant.

For example, some manually written graphql APIs do select unnecessary columns

This means, literally, many people write resolvers that selects columns even not requested from client. For example, when there’s a query articles, which selects every articles, then resolvers would look like this.

const resolvers = {
  Query: {
    articles: () => sql('SELECT * from article')
  },
  Article: {
    id: (parent) => parent.id,
    title: (parent) => parent.title,
    content: (parent) => parent.content,
  }
}

A resolver chain is a tree, which seperates field resolvers from sql. This typically results in selecting every columns. People do know its inefficiency. However, the tree provides an easy to write structure, and people don’t care about micro optimization till needed. But this can become a problem when working with a table with lots of (more than a hundred) columns.

and if users know hasura solves the issue, it could be helpful for them to decide using hasura.

So, if people with the problem understand hasura guarentees selecting only requested columns, then they might be able to easily decide to use hasura.

What if docs mentions about this for making users feel certain?

That’s why I suggested explicitly mentioning this fact(“hasura only select columns requested from graphql query, as it’s a graphql compiler to sql.”) would be good. Currently, as far as I know, the docs only say hasura is a realtime(if statement is not “prepared” yet) graphql compiler to sql.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is it faster to only query specific columns? - Stack Overflow
In general, reducing the number of columns in the select is a minor optimization. It means that less data is being returned from...
Read more >
SELECT * Vs. SELECT COLUMNS – SQL Server Optimization ...
Don't use * in SELECT query, instead use only required column This is one of the tips to optimize SELECT query. However, does...
Read more >
Query optimization techniques in SQL Server: tips and tricks
In this blog post we will show you step by step some tips and tricks for successful Query optimization techniques in SQL Server....
Read more >
Why is selecting all resulting columns of this query faster than ...
It's true that selecting more columns implies that SQL Server may need to work harder to get the requested results of the query....
Read more >
SQL Query Optimization: 12 Useful Performance Tuning Tips ...
Instead, you can specify the exact columns you need to get data from, thus, saving database resources. In this case, SQL Server will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found