Issues batching queries with gatsby-source-graphql
See original GitHub issueHey folks - first off, thanks for building Gatsby! It’s been a pleasure to use overall.
Description
Context My team is hitting an issue similar to #13425 & #19803 (both which were closed due to inactivity). I also mentioned this in https://github.com/gatsbyjs/gatsby/discussions/28680 & figure it’s worth opening an official issue at this point.
Effectively we’re hitting an issue with gatsby-source-graphql
generating n+1 queries to a remote GraphQL API (in our case the Github GraphQL API). We have a page template that queries the git commit history of a file
, a query that is often identical across pages (for each of the 30+ languages we support, we query the commit history of the corresponding English file, e.g. the query to Github for this English page is identical to this translated page). As our site continues to grow, we’re quickly hitting Github’s API rate limit of 5,000 points per hour, causing our site builds to frequently fail.
Issues w/ Apollo Link batching Both @KyleAMathews & @vladar were kind enough to reply to my discussion thread (https://github.com/gatsbyjs/gatsby/discussions/28680) & recommended using Apollo Link. I’ve attempted to set up query batching using Apollo Link but haven’t been succesfull.
Here’s what I’ve tried.
- Batching with HttpLinkDataLoader as mentioned in the
gatsby-source-graphql
docs
in gatsby-config.js:
{
resolve: `gatsby-source-graphql`,
options: {
typeName: `GitHub`,
fieldName: `github`,
createLink: () => {
return new HTTPLinkDataloader({
uri: `https://api.github.com/graphql`,
fetch,
headers: {
Authorization: `Bearer ${GITHUB_TOKEN_READ_ONLY}`,
},
})
},
},
},
Which returns this error on build:
Error: GraphQL Error: {
"message": "Body should be a JSON object",
"documentation_url": "https://docs.github.com/graphql",
"status": 400
}
(Sidenote: I’m keen to know why you recommend HttpLinkDataLoader vs apollo-link-batch-http. Is there an example of a Gatsby project using HttpLinkDataLoader which I could follow?)
- Batching with
apollo-link-batch-http
in gatsby-config.js:
{
resolve: `gatsby-source-graphql`,
options: {
typeName: `GitHub`,
fieldName: `github`,
createLink: () => {
return ApolloLink.from([
loggerLink,
errorLink,
new BatchHttpLink({
uri: `https://api.github.com/graphql`,
headers: {
Authorization: `Bearer ${GITHUB_TOKEN_READ_ONLY}`,
},
fetch,
}),
])
},
},
},
Which returns this error on build:
GraphQL Request: IntrospectionQuery
Network Error: Response not successful: Received status code 400
ERROR #11321 PLUGIN
"gatsby-source-graphql" threw an error while running the sourceNodes lifecycle:
Response not successful: Received status code 400
ServerError: Response not successful: Received status code 400
- index.ts:114 Object.exports.throwServerError
[ethereum-org-website]/[apollo-link-http-common]/src/index.ts:114:17
- index.ts:145
[ethereum-org-website]/[apollo-link-http-common]/src/index.ts:145:11
- task_queues.js:97 processTicksAndRejections
Steps to reproduce
This PR shows the various approaches I took in each commit: https://github.com/ethereum/ethereum-org-website/pull/2295
Expected result
I was hoping that these implementations would “just work” to batch our GraphQL queries & avoid rate limits as a result.
Actual result
Builds failed (see above).
Environment
$ gatsby info --clipboard
System:
OS: macOS 10.15.5
CPU: (8) x64 Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
Shell: 5.0.7 - /usr/local/bin/bash
Binaries:
Node: 12.19.1 - ~/.nvm/versions/node/v12.19.1/bin/node
Yarn: 1.19.1 - /usr/local/bin/yarn
npm: 6.14.11 - ~/.nvm/versions/node/v12.19.1/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
Browsers:
Chrome: 88.0.4324.96
Firefox: 82.0.3
Safari: 13.1.1
npmPackages:
gatsby: ^2.31.0 => 2.31.0
gatsby-image: ^2.4.7 => 2.7.0
gatsby-plugin-intl: ^0.3.3 => 0.3.3
gatsby-plugin-lodash: ^3.3.11 => 3.6.0
gatsby-plugin-manifest: ^2.4.13 => 2.8.0
gatsby-plugin-matomo: ^0.8.3 => 0.8.3
gatsby-plugin-mdx: ^1.2.15 => 1.6.0
gatsby-plugin-react-helmet: ^3.3.4 => 3.6.0
gatsby-plugin-react-helmet-canonical-urls: ^1.4.0 => 1.4.0
gatsby-plugin-sharp: ^2.6.11 => 2.10.1
gatsby-plugin-sitemap: ^2.4.7 => 2.8.0
gatsby-plugin-styled-components: ^3.3.3 => 3.6.0
gatsby-remark-autolink-headers: ^2.3.5 => 2.7.0
gatsby-remark-copy-linked-files: ^2.4.0 => 2.6.0
gatsby-remark-images: ^3.3.12 => 3.7.0
gatsby-source-filesystem: ^2.3.10 => 2.7.0
gatsby-source-graphql: ^2.13.0 => 2.13.0
gatsby-transformer-csv: ^2.3.10 => 2.6.0
gatsby-transformer-gitinfo: ^1.1.0 => 1.1.0
gatsby-transformer-remark: ^2.8.16 => 2.12.0
gatsby-transformer-sharp: ^2.5.5 => 2.8.0
npmGlobalPackages:
gatsby-cli: 2.16.1
Additional thoughts
We’re open to additional solutions here, e.g. is there a way to cache results for these duplicate queries to avoid rate limits? Is there a way for these requests to fail gracefully (e.g. return empty results) vs breaking the build? Is there an alternative to gatsby-source-graphql
, like https://github.com/gatsbyjs/gatsby-graphql-toolkit?
Thanks in advance!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
Setting option
batch
totrue
is a no-op for your case because you provide your own link. When this option is set the plugin simply replaces the default HTTP link with DataLoader link (but you already do it manually):https://github.com/gatsbyjs/gatsby/blob/8b6bfa61a8ed502df71057409862b7771b096156/packages/gatsby-source-graphql/src/gatsby-node.js#L61
To control the level of batching you can play with
GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY
env variable anddataLoaderOptions
(see the full list of options)Maybe try increasing
GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY
to 40 and also increasemaxBatchSize
to10
. Another important option for dataloader isbatchScheduleFn
. By default DataLoaderLink batches queries that were started within50ms
time span. You can experiment by setting it to a higher value, like100ms
or150ms
.To recap, this is how your config could look like with those options adjusted:
And also don’t forget to bump
GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY
(play with values from 20 to 40 when using DataloaderLink).As for the number of queries sent - batching is opaque to
loggerLink
. Logger link “thinks” that all GraphQL queries are sent separately, it doesn’t know that they are batched at a lower level.I think you will have to count the number of actual HTTP requests manually - maybe by wrapping a
fetch
function with your own counter, e.g.:Hey @samajammin !
apollo-link-batch-http
only works with Apollo Server (and other servers that support Apollo-style batching). This flavor of batching uses a special format of the HTTP request. Read more on their approach here.In other words, they batch queries at HTTP-request/response level.
I assume GitHub Servers do not support this kind of batching and they don’t understand this HTTP request format. As a result, you get an
HTTP 400
error in response (which is “Bad request”).In
gatsby-source-graphql
we’ve introduced the other kind of batching - batching at GraphQL level. And it is supported by any spec-compliant GraphQL server.If you want to use it in the Apollo link chain, try something like this: