question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues batching queries with gatsby-source-graphql

See original GitHub issue

Hey folks - first off, thanks for building Gatsby! It’s been a pleasure to use overall.

Description

Context My team is hitting an issue similar to #13425 & #19803 (both which were closed due to inactivity). I also mentioned this in https://github.com/gatsbyjs/gatsby/discussions/28680 & figure it’s worth opening an official issue at this point.

Effectively we’re hitting an issue with gatsby-source-graphql generating n+1 queries to a remote GraphQL API (in our case the Github GraphQL API). We have a page template that queries the git commit history of a file , a query that is often identical across pages (for each of the 30+ languages we support, we query the commit history of the corresponding English file, e.g. the query to Github for this English page is identical to this translated page). As our site continues to grow, we’re quickly hitting Github’s API rate limit of 5,000 points per hour, causing our site builds to frequently fail.

Issues w/ Apollo Link batching Both @KyleAMathews & @vladar were kind enough to reply to my discussion thread (https://github.com/gatsbyjs/gatsby/discussions/28680) & recommended using Apollo Link. I’ve attempted to set up query batching using Apollo Link but haven’t been succesfull.

Here’s what I’ve tried.

  1. Batching with HttpLinkDataLoader as mentioned in the gatsby-source-graphql docs

in gatsby-config.js:

    {
      resolve: `gatsby-source-graphql`,
      options: {
        typeName: `GitHub`,
        fieldName: `github`,
        createLink: () => {
          return new HTTPLinkDataloader({
            uri: `https://api.github.com/graphql`,
            fetch,
            headers: {
              Authorization: `Bearer ${GITHUB_TOKEN_READ_ONLY}`,
            },
          })
        },
      },
    },

Which returns this error on build:

  Error: GraphQL Error: {
    "message": "Body should be a JSON object",
    "documentation_url": "https://docs.github.com/graphql",
    "status": 400
  }

(Sidenote: I’m keen to know why you recommend HttpLinkDataLoader vs apollo-link-batch-http. Is there an example of a Gatsby project using HttpLinkDataLoader which I could follow?)

  1. Batching with apollo-link-batch-http

in gatsby-config.js:

    {
      resolve: `gatsby-source-graphql`,
      options: {
        typeName: `GitHub`,
        fieldName: `github`,
        createLink: () => {
          return ApolloLink.from([
            loggerLink,
            errorLink,
            new BatchHttpLink({
              uri: `https://api.github.com/graphql`,
              headers: {
                Authorization: `Bearer ${GITHUB_TOKEN_READ_ONLY}`,
              },
              fetch,
            }),
          ])
        },
      },
    },

Which returns this error on build:

GraphQL Request: IntrospectionQuery
Network Error: Response not successful: Received status code 400

 ERROR #11321  PLUGIN

"gatsby-source-graphql" threw an error while running the sourceNodes lifecycle:

Response not successful: Received status code 400

  ServerError: Response not successful: Received status code 400

  - index.ts:114 Object.exports.throwServerError
    [ethereum-org-website]/[apollo-link-http-common]/src/index.ts:114:17

  - index.ts:145
    [ethereum-org-website]/[apollo-link-http-common]/src/index.ts:145:11

  - task_queues.js:97 processTicksAndRejections

Steps to reproduce

This PR shows the various approaches I took in each commit: https://github.com/ethereum/ethereum-org-website/pull/2295

Expected result

I was hoping that these implementations would “just work” to batch our GraphQL queries & avoid rate limits as a result.

Actual result

Builds failed (see above).

Environment

$ gatsby info --clipboard

  System:
    OS: macOS 10.15.5
    CPU: (8) x64 Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    Shell: 5.0.7 - /usr/local/bin/bash
  Binaries:
    Node: 12.19.1 - ~/.nvm/versions/node/v12.19.1/bin/node
    Yarn: 1.19.1 - /usr/local/bin/yarn
    npm: 6.14.11 - ~/.nvm/versions/node/v12.19.1/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  Browsers:
    Chrome: 88.0.4324.96
    Firefox: 82.0.3
    Safari: 13.1.1
  npmPackages:
    gatsby: ^2.31.0 => 2.31.0
    gatsby-image: ^2.4.7 => 2.7.0
    gatsby-plugin-intl: ^0.3.3 => 0.3.3
    gatsby-plugin-lodash: ^3.3.11 => 3.6.0
    gatsby-plugin-manifest: ^2.4.13 => 2.8.0
    gatsby-plugin-matomo: ^0.8.3 => 0.8.3
    gatsby-plugin-mdx: ^1.2.15 => 1.6.0
    gatsby-plugin-react-helmet: ^3.3.4 => 3.6.0
    gatsby-plugin-react-helmet-canonical-urls: ^1.4.0 => 1.4.0
    gatsby-plugin-sharp: ^2.6.11 => 2.10.1
    gatsby-plugin-sitemap: ^2.4.7 => 2.8.0
    gatsby-plugin-styled-components: ^3.3.3 => 3.6.0
    gatsby-remark-autolink-headers: ^2.3.5 => 2.7.0
    gatsby-remark-copy-linked-files: ^2.4.0 => 2.6.0
    gatsby-remark-images: ^3.3.12 => 3.7.0
    gatsby-source-filesystem: ^2.3.10 => 2.7.0
    gatsby-source-graphql: ^2.13.0 => 2.13.0
    gatsby-transformer-csv: ^2.3.10 => 2.6.0
    gatsby-transformer-gitinfo: ^1.1.0 => 1.1.0
    gatsby-transformer-remark: ^2.8.16 => 2.12.0
    gatsby-transformer-sharp: ^2.5.5 => 2.8.0
  npmGlobalPackages:
    gatsby-cli: 2.16.1

Additional thoughts

We’re open to additional solutions here, e.g. is there a way to cache results for these duplicate queries to avoid rate limits? Is there a way for these requests to fail gracefully (e.g. return empty results) vs breaking the build? Is there an alternative to gatsby-source-graphql, like https://github.com/gatsbyjs/gatsby-graphql-toolkit?

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
vladarcommented, Jan 25, 2021

Setting option batch to true is a no-op for your case because you provide your own link. When this option is set the plugin simply replaces the default HTTP link with DataLoader link (but you already do it manually):

https://github.com/gatsbyjs/gatsby/blob/8b6bfa61a8ed502df71057409862b7771b096156/packages/gatsby-source-graphql/src/gatsby-node.js#L61

To control the level of batching you can play with GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY env variable and dataLoaderOptions (see the full list of options)

Maybe try increasing GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY to 40 and also increase maxBatchSize to 10. Another important option for dataloader is batchScheduleFn. By default DataLoaderLink batches queries that were started within 50ms time span. You can experiment by setting it to a higher value, like 100ms or 150ms.

To recap, this is how your config could look like with those options adjusted:

const { createDataloaderLink } = require(`gatsby-source-graphql/batching/dataloader-link`)

{
  resolve: `gatsby-source-graphql`,
  options: {
    typeName: `GitHub`,
    fieldName: `github`,
    createLink: () => {
      return ApolloLink.from([
        loggerLink,
        errorLink,
        createDataloaderLink({
          uri: `https://api.github.com/graphql`,
          headers: {
            Authorization: `Bearer ${GITHUB_TOKEN_READ_ONLY}`,
          },
          dataLoaderOptions: {
            maxBatchSize: 10,
            batchScheduleFn: callback => setTimeout(callback, 100),
          },
          fetch,
        }),
      ])
    },
  },
},

And also don’t forget to bump GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY (play with values from 20 to 40 when using DataloaderLink).

As for the number of queries sent - batching is opaque to loggerLink. Logger link “thinks” that all GraphQL queries are sent separately, it doesn’t know that they are batched at a lower level.

I think you will have to count the number of actual HTTP requests manually - maybe by wrapping a fetch function with your own counter, e.g.:

let requests = 0
const myFetch = (...args) => {
  requests++
  return fetch(...args)
}

{
  resolve: `gatsby-source-graphql`,
  options: {
    typeName: `GitHub`,
    fieldName: `github`,
    createLink: () => {
      return ApolloLink.from([
        // ...
        createDataloaderLink({
          // ...
          fetch: myFetch,
        }),
      ])
    },
  },
},

2reactions
vladarcommented, Jan 22, 2021

Hey @samajammin !

apollo-link-batch-http only works with Apollo Server (and other servers that support Apollo-style batching). This flavor of batching uses a special format of the HTTP request. Read more on their approach here.

In other words, they batch queries at HTTP-request/response level.

I assume GitHub Servers do not support this kind of batching and they don’t understand this HTTP request format. As a result, you get an HTTP 400 error in response (which is “Bad request”).

In gatsby-source-graphql we’ve introduced the other kind of batching - batching at GraphQL level. And it is supported by any spec-compliant GraphQL server.

If you want to use it in the Apollo link chain, try something like this:

const { createDataloaderLink } = require(`gatsby-source-graphql/batching/dataloader-link`)

{
  resolve: `gatsby-source-graphql`,
  options: {
    typeName: `GitHub`,
    fieldName: `github`,
    createLink: () => {
      return ApolloLink.from([
        loggerLink,
        errorLink,
        createDataloaderLink({
          uri: `https://api.github.com/graphql`,
          headers: {
            Authorization: `Bearer ${GITHUB_TOKEN_READ_ONLY}`,
          },
          fetch,
        }),
      ])
    },
  },
},
Read more comments on GitHub >

github_iconTop Results From Across the Web

gatsby-source-graphql
Under the hood gatsby-source-graphql uses DataLoader for query batching. It merges all queries from a batch to a single query that gets sent...
Read more >
Gatsby Changelog | 5.3.0
gatsby : Fix writing of static query files when automatic sort and aggregation graphql codemod is run, via PR #36997; Fix graphql@16 peer...
Read more >
Data fetching with Gatsby and GraphQL - LogRocket Blog
Use GraphQL to fetch data from a Gatsby configuration and different sources including the file system, external APIs, databases, and CMSs.
Read more >
Five Common Problems in GraphQL Apps (And How to Fix ...
Schema duplication; Server/client data mismatch; Superfluous database calls; Poor performance; Boilerplate overdose. I'm willing to bet your app ...
Read more >
GraphQL Best Practices
A query language for your API — GraphQL provides a complete description of the data in your API, gives clients the power to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found