question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gatsby-transform-json out of memory

See original GitHub issue

Preliminary Checks

Description

I’m running a 10k+ pages instance on gatsby cloud and in a dedicated branch I upgraded gatsby 4 dependencies for a while. Unfortunately I never got it running due to high memory consumption.

Your Gatsby build's memory consumption exceeded the limits allowed in your plan. 
For more details, see https://gatsby.dev/memory.

Also my local builds on my machine consumed huge amounts of memory until I killed the process.

Screenshot 2021-11-25 at 17 32 36

Long story short, today I had some time to dig into the issue, commented out all my plugins and isolated the issue. The source of the issue is this line https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-transformer-json/src/gatsby-node.js#L63

A .forEach loop over the array items of a json file to create the individual json nodes. Whats the issue you may ask?

Gatsbys createNode function which is used by transformObject returns a

Promise [which] resolves when all cascading onCreateNode API calls triggered by createNode have finished.

whereas .forEach

Note: forEach expects a synchronous function.

forEach does not wait for promises. Make sure you are aware of the implications while using promises (or async functions) as forEach callback.

The result is that the loop kicks off all 10k transformations right away instead of in some serial/concurrent loop which the code suggests.

The fix for that seems quite simple:

  • mark transformObject as an async function
  • use a regular loop which awaits the transformObject call
  if (_.isArray(parsedContent)) {
    for (let i = 0, l = parsedContent.length; i < l; i++) {
      const obj = parsedContent[i];

      await transformObject(
        obj,
        createNodeId(`${node.id} [${i}] >>> JSON`),
        getType({ node, object: obj, isArray: true })
      )
    }
  }

I feel you could get fancy here and use something like .eachLimit to add some concurrency and speed it up again.

Reproduction Link

https://github.com/joernroeder/gatsby-json-memory

Steps to Reproduce

  1. clone the reproduction link and rungatsby start
  2. Watch the memory consumption

Expected Result

no multi GB memory usage 😃

Actual Result

broken builds

Environment

Binaries:
    Node: 14.16.1
    Yarn: 1.22.17
    npm: 7.16.0 - ~/.nvm/versions/node/v14.16.1/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  npmPackages:
    gatsby: @next => 4.3.0-next.1 
    gatsby-source-filesystem: ^4.2.0 => 4.2.0 
    gatsby-transformer-json: ^4.2.0 => 4.2.0

Config Flags

No response

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
KyleAMathewscommented, Nov 26, 2021

What changed with v4 is we now write data to LMDB which is more costly than writing to memory. The sync forEach loop doesn’t allow LMDB to flush data to disk meaning in progress changes accumulate. Switching to async solves as does using e.g. process.nextTick to let the event loop run through again.

0reactions
joernroedercommented, Nov 27, 2021

@KyleAMathews ah that makes sense, didn’t know LMDB is part of gatsby 4 by default – that’s awesome!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resolving Out-of-Memory Issues - Gatsby
Upgrade Gatsby & gatsby source plugins · Try using Gatsby Cloud · Try reducing the number of cores · Increase allocated memory and/or...
Read more >
memory access out of bounds - gatsby development extremely ...
A temporary solution we have implemented is changing Gatsby's default source maps from cheap-module-eval-source-map to eval which has brought ...
Read more >
How YOU can learn to extend Gatsby further by authoring ...
Examples of things a Transformer plugin could do is to take the content of JSON or YAML files and convert that into Nodes...
Read more >
Gatsby v4 works locally, but timed out on Netlify - Support
I've forced my Gatsby version to =v4.1.3 in package.json and my site builds (chances are other 4.x versions work, but definitely NOT 4.8...
Read more >
[Need help] with Gatsby Plugin Image and GraphQL - Reddit
You need to have gatsby-transformer-json and ... over or use other array methods (.find() or .filter()) to pull out the image/s you want....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found