[gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source and transform nodes')
See original GitHub issueDescription
gatsby develop
hangs on source and transform nodes
after querying a large WordPress installation (~9000 posts, ~35 pages).
Is there any guides as to what’s too big for Gatsby to handle in this regards?
Environment
System:
OS: macOS High Sierra 10.13.6
CPU: x64 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Shell: 3.2.57 - /bin/bash
Binaries:
Node: 8.10.0 - ~/n/bin/node
Yarn: 1.5.1 - ~/n/bin/yarn
npm: 5.6.0 - ~/n/bin/npm
Browsers:
Chrome: 67.0.3396.99
Safari: 11.1.2
npmPackages:
gatsby: ^1.9.273 => 1.9.273
gatsby-image: ^1.0.54 => 1.0.54
gatsby-link: ^1.6.45 => 1.6.45
gatsby-plugin-google-analytics: ^1.0.27 => 1.0.31
gatsby-plugin-postcss-sass: ^1.0.22 => 1.0.22
gatsby-plugin-react-helmet: ^2.0.10 => 2.0.11
gatsby-plugin-react-next: ^1.0.11 => 1.0.11
gatsby-plugin-resolve-src: 1.1.3 => 1.1.3
gatsby-plugin-sharp: ^1.6.48 => 1.6.48
gatsby-plugin-svgr: ^1.0.1 => 1.0.1
gatsby-source-filesystem: ^1.5.39 => 1.5.39
gatsby-source-wordpress: ^2.0.93 => 2.0.93
gatsby-transformer-sharp: ^1.6.27 => 1.6.27
npmGlobalPackages:
gatsby-cli: 1.1.58
edit: Just want to reiterate—this is not something easily fixable by deleted .cache/
, .node_modules
, etc. If that resolves your problem, you weren’t experiencing this issue.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:19
- Comments:156 (84 by maintainers)
Top Results From Across the Web
Developers - [gatsby-source-wordpress] Large ... - Bountysource
[gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source and transform nodes')
Read more >تويتر \ Gatsby على تويتر: "Looking for a high-impact Gatsby bug ...
[gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source... Description gatsby develop hangs on source and transform ...
Read more >React/Gatsby/Wordpress stuck on "source and transform nodes"
We have a React/Gatsby site on local, which pulls blog posts and media from our prod WP hosted on WPEngine. Yesterday still, it...
Read more >Gatsby Changelog | 5.3.0
Inject context passed by createResolverContext action when materializing fields, via #36552. gatsby-source-wordpress : Match full urls when replacing media item ...
Read more >Troubleshooting Common Errors - Gatsby
A likely problem is that the operating system you are running locally is different than the one where your site is deployed. Oftentimes...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yeah the issue is really based on the fact that
createRemoteFileNode
uses concurrency of 200 which is too much for most WP servers. I have my images on CloudFront and was hitting some rate limits there.I tried fixing the issue with a branched version of the source-plugin for a while but the issue really isn’t in
gatsby-source-wordpress
it is ingatsby-source-filesystem
. Ideally consumers of thecreateRemoteFileNode
function would be able to pass in concurrency there. Then plugins could make the concurrency option available in their configs. I still would like to do a PR to address this issue!The solution I have been using is just a simple script to modify the code inside
node_modules
. Really quite fragile and not ideal but it is a simple hack to modify the concurrency directly. Usesshelljs
so it is supposed to work for windows users as well (haven’t tried).Hello,
I managed to add tracing using the steps outlined here https://www.gatsbyjs.org/docs/performance-tracing/. Unfortunately it did not provide much info as it simply told me that indeed source and transform nodes is taking quite long.
I have however done some of my own debugging on the issue after having some non-deterministic behavior involving images. When running either develop or build script I would get a case where not all of the images would be downloaded and the localFile nodes would not complete. After digging into the code I have determined that there seems to be an issue here
https://github.com/gatsbyjs/gatsby/blob/ad142af473fc8dc8555a5cf23a0dfca42fcbbe90/packages/gatsby-source-wordpress/src/normalize.js#L483-L506
For me createRemoteFile node was failing due to server timeout errors and defaults to returning null. I had to add some logging to createRemoteFile node as well to determine this and get the actual server responses. Since these nodes don’t complete and do not have ID’s they don’t get registered in the cache. The tmp files are deleted and the gatsby-source-filesystem was incomplete. For whatever reason (I haven’t looked that far yet) upon running the build script again the source-filesystem was then deleted probably because the script detects the filesystem is invalid or incomplete. It was this process that was for me creating a loop and causing errors on future builds as the filesystem never completes.
I’m working on a fix that seems to alleviate some of the issues at least regarding large amounts of images. When the develop or build script is successful in downloading all of the images the first time, it subsequently is not deleted and then the build process happens quite rapidly as the images are properly cached by gatsby-source-filesystem! My build went from 15 minutes down to 1 minute.
I’m not sure whether this is related to builds that have large amounts of posts. My issue was directly related to downloading 1.6 GB of image data.
This is my first time working with source plugins for gatsby so if anyone has any thoughts or advice regarding this I would appreciate it! I should be able to post my repo later today I am working on getting it to use my local version of gatsby-source-filesystem without complications.