Is there a hard limit on maximum number of pages that Gatsby can build?
See original GitHub issueI’m trying to build a site with ~150k pages (probably more than this when I get closer to finishing) using Gatsby with a CSV file as data source. I initially had a sample dataset with about 100 rows in a CSV file and developed my initial pages and it worked. When I tried running gatsby build
with all 150k rows, build was getting stuck in “source and transform nodes” step.
As suggested by @KyleAMathews, I split the large CSV into multiple files (varied number of rows based on data) and the build now finishes “source and transform nodes” in about 100s, but fails with heap out of memory error.
I also tried running the create pages benchmark site with 125k pages and it fails with the same error too, while it builds the site in less than 2 minutes for 100k pages.
I tried figuring out the underlying issue myself. From page creation docs, I reached pages reducer and found that we use JavaScript Map
for the state.
I was wondering if there’s a hard limit on the number of items that can be set in a Map
. From this StackOverflow answer, it looks like we can set only upto 2^24 (roughly 167k) items in a Map. I’m not very sure about what else does this redux state have, but if it’s storing only the pages, does ~167k become a hard limit for the number of pages that Gatsby can build?
There’s a lot of places where we use Map in Gatsby source code. It’s probably one of them causing this out of memory error?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:23 (23 by maintainers)
Top GitHub Comments
You might think I have forgotten about this.
But you’d be wrong.
And happy.
Debugging the problem in this build turned out to be a deep rabbit hole and it took me some time to get in, and out of it. But, happy to report I can build your site in ~10 minutes now.
You’ll have to wait a bit before you can do this but there are some fixes / workarounds upcoming.
The basic gist is that the way nodes are looked up have a shortcut for querying by
id
. Unfortunately this heuristic is not optimal and fails to hit the mark in your case. That led to a bunch of other things and will need to be fixed on Gatsby’s side.After that, the run queries step drops to ~10 minutes (down from 257 minutes, or 4.2 hours, as you can see above). Which makes me very happy :d
The wait for you is now for me to polish this fix, make sure the generic assumptions hold (is your site a one-of or are most sites like yours?) and then we should be good to go.
Now https://github.com/gatsbyjs/gatsby/pull/20609 has landed in master. This is the part from us you’ll need to see improvements. (Still needs to be published so if you’re not comfortable to build from source it usually doesn’t take long to get published).
The other change is to your repo. It’s changing the index from
slug
toid
:src/templates/ifsc.tsx:
gatsby-node.js
and later in that file
I think that should suffice.
With that, the
run queries
step should take roughly 5 minutes on Gatsby master.If you want counting stats while building for your pages (hey that’s 60 seconds less of looking at an idle screen) you can copy paste my whole change, which will use a progress bar for the
createPages
step (this is gatsby-config again);