gatsby-transform-json out of memory
See original GitHub issuePreliminary Checks
- This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/gatsbyjs/gatsby/issues
- This issue is not a question, feature request, RFC, or anything other than a bug report directly related to Gatsby. Please post those things in GitHub Discussions: https://github.com/gatsbyjs/gatsby/discussions
Description
I’m running a 10k+ pages instance on gatsby cloud and in a dedicated branch I upgraded gatsby 4 dependencies for a while. Unfortunately I never got it running due to high memory consumption.
Your Gatsby build's memory consumption exceeded the limits allowed in your plan.
For more details, see https://gatsby.dev/memory.
Also my local builds on my machine consumed huge amounts of memory until I killed the process.

Long story short, today I had some time to dig into the issue, commented out all my plugins and isolated the issue. The source of the issue is this line https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-transformer-json/src/gatsby-node.js#L63
A .forEach
loop over the array items of a json file to create the individual json nodes. Whats the issue you may ask?
Gatsbys createNode function which is used by transformObject
returns a
Promise [which] resolves when all cascading onCreateNode API calls triggered by createNode have finished.
whereas .forEach
Note: forEach expects a synchronous function.
forEach
does not wait for promises. Make sure you are aware of the implications while using promises (or async functions) as forEach callback.
The result is that the loop kicks off all 10k transformations right away instead of in some serial/concurrent loop which the code suggests.
The fix for that seems quite simple:
- mark
transformObject
as anasync function
- use a regular loop which
awaits
thetransformObject
call
if (_.isArray(parsedContent)) {
for (let i = 0, l = parsedContent.length; i < l; i++) {
const obj = parsedContent[i];
await transformObject(
obj,
createNodeId(`${node.id} [${i}] >>> JSON`),
getType({ node, object: obj, isArray: true })
)
}
}
I feel you could get fancy here and use something like .eachLimit to add some concurrency and speed it up again.
Reproduction Link
https://github.com/joernroeder/gatsby-json-memory
Steps to Reproduce
- clone the reproduction link and run
gatsby start
- Watch the memory consumption
Expected Result
no multi GB memory usage 😃
Actual Result
broken builds
Environment
Binaries:
Node: 14.16.1
Yarn: 1.22.17
npm: 7.16.0 - ~/.nvm/versions/node/v14.16.1/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
npmPackages:
gatsby: @next => 4.3.0-next.1
gatsby-source-filesystem: ^4.2.0 => 4.2.0
gatsby-transformer-json: ^4.2.0 => 4.2.0
Config Flags
No response
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:5 (5 by maintainers)
What changed with v4 is we now write data to LMDB which is more costly than writing to memory. The sync forEach loop doesn’t allow LMDB to flush data to disk meaning in progress changes accumulate. Switching to async solves as does using e.g. process.nextTick to let the event loop run through again.
@KyleAMathews ah that makes sense, didn’t know LMDB is part of gatsby 4 by default – that’s awesome!