Issues with large page numbers (>60k)
See original GitHub issueDescription
I’ve been setting up a content surfacing system using GatsbyJS, and we’re encountering a fair few issues with the amount of pages we have. I’ve made a few changes as suggested in Discord, and done a fair bit of investigating into the cause of the slowdowns.
The following symptoms have been noticed:
npm run develop
is significantly slower thannpm run build
(50 minutes vs 2 minutes)- Slowdown occurs during the “running graphql queries” step, but before the text has shown
- Some machines encounter an “invalid instruction” crash after the “info bootstrap finished” text (http://paste.enginehub.org/tb7s6C)
- It appears to be running a query per page, despite the fact that the pages have no queries. There appears to already be an issue for this (https://github.com/gatsbyjs/gatsby/issues/12216)
A few notes about our setup:
- All data is passed to pages/templates via the context, rather than a per-page query (As per discord recommendation)
- I can provide the source code to the Gatsby team privately if required
What we’ve discovered:
- The major slowdowns appear to be related to the queue library, and a QuickSort that gets run on insertion of elements. From what we can tell, the “Running graphql queries” step is actually running a list of tasks that may not be related to graphql, so it may be worth renaming this.
- As for why it significantly slows down in develop, we discovered that in build the priority function is deleted from the queue. We’ve noticed the same speedups by setting the priority of non-active paths to
undefined
, which skips sorting for them entirely. (https://github.com/diamondio/better-queue-memory/blob/cff881f2074ff0508bcb6e932bda0b92977d3d2b/index.js#L48) - Halving the page number takes the time from 50 minutes to 10 minutes, so it’s not a linear slowdown.
Steps to reproduce
Using the source code that I can provide privately:
- Run
npm run build
, notice the time it takes to run, and then crashes Node. - Run
npm run develop
, notice that it takes significantly longer, with the same crash.
Expected result
Gatsby should be able to handle this quantity of pages, as there are multiple sources that state they’re running ~10 million with little to no issue.
Actual result
Gatsby struggles at these page numbers.
Environment
System: OS: macOS 10.14.2 CPU: (12) x64 Intel® Core™ i9-8950HK CPU @ 2.90GHz Shell: 3.2.57 - /bin/bash Binaries: Node: 10.15.3 - ~/.nvm/versions/node/v10.15.3/bin/node npm: 6.8.0 - ~/.nvm/versions/node/v10.15.3/bin/npm Languages: Python: 2.7.15 - /usr/local/bin/python Browsers: Chrome: 72.0.3626.119 Safari: 12.0.2 npmPackages: gatsby: ^2.1.23 => 2.1.23 gatsby-image: ^2.0.31 => 2.0.31 gatsby-plugin-catch-links: ^2.0.12 => 2.0.12 gatsby-plugin-manifest: ^2.0.22 => 2.0.22 gatsby-plugin-offline: ^2.0.24 => 2.0.24 gatsby-plugin-react-helmet: ^3.0.8 => 3.0.8 gatsby-plugin-sharp: ^2.0.25 => 2.0.25 gatsby-plugin-styled-components: ^3.0.6 => 3.0.6 gatsby-plugin-typescript: ^2.0.10 => 2.0.10 gatsby-plugin-web-font-loader: ^1.0.4 => 1.0.4 gatsby-source-filesystem: ^2.0.23 => 2.0.23 gatsby-transformer-json: ^2.1.8 => 2.1.8 gatsby-transformer-sharp: ^2.1.15 => 2.1.15
Issue Analytics
- State:
- Created 5 years ago
- Reactions:5
- Comments:5 (5 by maintainers)
@me4502 I believe we’ve fixed this with #10732 and gatsby@^2.3.20.
Closing this out–but please re-open or reply if this is not the case and you can still reproduce these OOM issues.
We’re always working on making Gatsby more scalable, and the more issues we can surface and fix–the better. Thanks for surfacing this one!
@KyleAMathews That option doesn’t appear to speed it up too much, however I’ve made a PR that brings develop performance to basically the same as build performance (https://github.com/gatsbyjs/gatsby/pull/12365)
@stefanprobst It appears that does fix the crash, even just switching the
json-stringify-safe
withJSON.stringify
fixes it.