question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source and transform nodes')

See original GitHub issue

Description

gatsby develop hangs on source and transform nodes after querying a large WordPress installation (~9000 posts, ~35 pages).

Is there any guides as to what’s too big for Gatsby to handle in this regards?

Environment

  System:
    OS: macOS High Sierra 10.13.6
    CPU: x64 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
    Shell: 3.2.57 - /bin/bash
  Binaries:
    Node: 8.10.0 - ~/n/bin/node
    Yarn: 1.5.1 - ~/n/bin/yarn
    npm: 5.6.0 - ~/n/bin/npm
  Browsers:
    Chrome: 67.0.3396.99
    Safari: 11.1.2
  npmPackages:
    gatsby: ^1.9.273 => 1.9.273
    gatsby-image: ^1.0.54 => 1.0.54
    gatsby-link: ^1.6.45 => 1.6.45
    gatsby-plugin-google-analytics: ^1.0.27 => 1.0.31
    gatsby-plugin-postcss-sass: ^1.0.22 => 1.0.22
    gatsby-plugin-react-helmet: ^2.0.10 => 2.0.11
    gatsby-plugin-react-next: ^1.0.11 => 1.0.11
    gatsby-plugin-resolve-src: 1.1.3 => 1.1.3
    gatsby-plugin-sharp: ^1.6.48 => 1.6.48
    gatsby-plugin-svgr: ^1.0.1 => 1.0.1
    gatsby-source-filesystem: ^1.5.39 => 1.5.39
    gatsby-source-wordpress: ^2.0.93 => 2.0.93
    gatsby-transformer-sharp: ^1.6.27 => 1.6.27
  npmGlobalPackages:
    gatsby-cli: 1.1.58

edit: Just want to reiterate—this is not something easily fixable by deleted .cache/, .node_modules, etc. If that resolves your problem, you weren’t experiencing this issue.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:19
  • Comments:156 (84 by maintainers)

github_iconTop GitHub Comments

9reactions
njmyerscommented, May 31, 2019

Guys, I managed to fix this by running createRemoteFileNode requests in serial instead of parallel.

Yeah the issue is really based on the fact that createRemoteFileNode uses concurrency of 200 which is too much for most WP servers. I have my images on CloudFront and was hitting some rate limits there.

I tried fixing the issue with a branched version of the source-plugin for a while but the issue really isn’t in gatsby-source-wordpress it is in gatsby-source-filesystem. Ideally consumers of the createRemoteFileNode function would be able to pass in concurrency there. Then plugins could make the concurrency option available in their configs. I still would like to do a PR to address this issue!

The solution I have been using is just a simple script to modify the code inside node_modules. Really quite fragile and not ideal but it is a simple hack to modify the concurrency directly. Uses shelljs so it is supposed to work for windows users as well (haven’t tried).

#!/usr/bin/env node
const path = require('path');
const shell = require('shelljs');

const FILE_PATH = path.resolve(
  __dirname,
  // add path to your root dir here,
  'node_modules',
  'gatsby-source-filesystem/create-remote-file-node.js'
);

shell.sed('-i', 'concurrent: 200', 'concurrent: 20', FILE_PATH);
9reactions
njmyerscommented, Nov 12, 2018

Hello,

I managed to add tracing using the steps outlined here https://www.gatsbyjs.org/docs/performance-tracing/. Unfortunately it did not provide much info as it simply told me that indeed source and transform nodes is taking quite long.

I have however done some of my own debugging on the issue after having some non-deterministic behavior involving images. When running either develop or build script I would get a case where not all of the images would be downloaded and the localFile nodes would not complete. After digging into the code I have determined that there seems to be an issue here

https://github.com/gatsbyjs/gatsby/blob/ad142af473fc8dc8555a5cf23a0dfca42fcbbe90/packages/gatsby-source-wordpress/src/normalize.js#L483-L506

For me createRemoteFile node was failing due to server timeout errors and defaults to returning null. I had to add some logging to createRemoteFile node as well to determine this and get the actual server responses. Since these nodes don’t complete and do not have ID’s they don’t get registered in the cache. The tmp files are deleted and the gatsby-source-filesystem was incomplete. For whatever reason (I haven’t looked that far yet) upon running the build script again the source-filesystem was then deleted probably because the script detects the filesystem is invalid or incomplete. It was this process that was for me creating a loop and causing errors on future builds as the filesystem never completes.

I’m working on a fix that seems to alleviate some of the issues at least regarding large amounts of images. When the develop or build script is successful in downloading all of the images the first time, it subsequently is not deleted and then the build process happens quite rapidly as the images are properly cached by gatsby-source-filesystem! My build went from 15 minutes down to 1 minute.

I’m not sure whether this is related to builds that have large amounts of posts. My issue was directly related to downloading 1.6 GB of image data.

This is my first time working with source plugins for gatsby so if anyone has any thoughts or advice regarding this I would appreciate it! I should be able to post my repo later today I am working on getting it to use my local version of gatsby-source-filesystem without complications.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - [gatsby-source-wordpress] Large ... - Bountysource
[gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source and transform nodes')
Read more >
تويتر \ Gatsby على تويتر: "Looking for a high-impact Gatsby bug ...
[gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source... Description gatsby develop hangs on source and transform ...
Read more >
React/Gatsby/Wordpress stuck on "source and transform nodes"
We have a React/Gatsby site on local, which pulls blog posts and media from our prod WP hosted on WPEngine. Yesterday still, it...
Read more >
Gatsby Changelog | 5.3.0
Inject context passed by createResolverContext action when materializing fields, via #36552. gatsby-source-wordpress : Match full urls when replacing media item ...
Read more >
Troubleshooting Common Errors - Gatsby
A likely problem is that the operating system you are running locally is different than the one where your site is deployed. Oftentimes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found