Uneven .md vs. .mdx build times
See original GitHub issueDescription
We run a documentation website that has hit a performance bottleneck at around 1000 pages. This led us to test the difference between gatsby-transformer-remark and gatsby-plugin-mdx to compare .md and .mdx build times.
We realize that these are not the same plugin, but our expectations were that the build times of each would be closer in-line with one another (for the exact same files).
We used the following repo to benchmark results: https://github.com/johnatspreadstreet/gatsby-md-vs-mdx
Here were the results of our test using the auto generated files:
Source and Transform Nodes | ||
# of Pages | md | mdx |
100 | 0.17s | 3.12s |
1000 | 0.90s | 23.05s |
8000 | 5.53s | 192.80s |
Steps to reproduce
GitHub repo: https://github.com/johnatspreadstreet/gatsby-md-vs-mdx
For markdown files:
- Make sure line 54 of md.generate.js is set to create .md
- Make sure gatsby-transformer-remark is used (and not gatsby-plugin-mdx) in the gatsby-config.js file
- Run
npm run bench
oryarn run bench
For mdx files:
- Make sure line 54 of md.generate.js is set to create .mdx
- Make sure gatsby-plugin-mdx is used (and not gatsby-transformer-remark) in the gatsby-config.js file
- Run
npm run bench
oryarn run bench
Expected result
Results of the gatsby build process for .md and .mdx files should be within reasonable distance of one another.
Actual result
Build times of .mdx were between 18 and 34 times longer for the source and transform nodes step vs. .md files.
Environment
System: OS: Windows 10 CPU: (16) x64 Intel® Core™ i9-9900K CPU @ 3.60GHz Binaries: Yarn: 1.18.0 - C:\Program Files (x86)\Yarn\bin\yarn.CMD npm: 6.9.0 - C:\Program Files\nodejs\npm.CMD Languages: Python: 2.7.15 - /c/Users/JYoun/.windows-build-tools/python27/python Browsers: Edge: 44.18362.449.0 npmPackages: gatsby: ^2.19.5 => 2.19.45 gatsby-image: ^2.2.39 => 2.2.44 gatsby-plugin-benchmark-reporting: * => 0.0.13 gatsby-plugin-feed: ^2.3.26 => 2.3.29 gatsby-plugin-google-analytics: ^2.1.34 => 2.1.38 gatsby-plugin-manifest: ^2.2.38 => 2.2.48 gatsby-plugin-mdx: ^1.0.83 => 1.0.83 gatsby-plugin-offline: ^3.0.32 => 3.0.41 gatsby-plugin-react-helmet: ^3.1.21 => 3.1.24 gatsby-plugin-sharp: ^2.4.0 => 2.4.13 gatsby-plugin-typography: ^2.3.21 => 2.3.25 gatsby-remark-copy-linked-files: ^2.1.36 => 2.1.40 gatsby-remark-images: ^3.1.42 => 3.1.50 gatsby-remark-prismjs: ^3.3.30 => 3.3.36 gatsby-remark-responsive-iframe: ^2.2.31 => 2.2.34 gatsby-remark-smartypants: ^2.1.20 => 2.1.23 gatsby-source-filesystem: ^2.1.46 => 2.1.56 gatsby-transformer-remark: ^2.6.48 => 2.6.59 gatsby-transformer-sharp: ^2.3.13 => 2.3.19
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:11 (5 by maintainers)
Here’s where I’m at so far: yesterday I ran benchmarks and could confirm I’m seeing the same growth in time during sourcing/transforming nodes. Each time I double the number of MDX pages I see ~2x increase in time in “source and transform nodes”. It appears to be continually growing, too. 🏒
After some digging I found a usage of a manual node filter for type using
getNodes
rather thangetNodesByType
. This, unsurprisingly, caused a lot of unnecessary node traversal at scale. That fix (in #22555) saw a ~30% reduction in build times for 16k pages (using my benchmark site so YMMV).My benchmark results after the change
I’ll keep digging into this as I get time and will report back any additional findings and performance improvements as we get them PRed in. I suspect there’s a lot more low hanging fruit.
It’s also important to note that MDX will always be quite a bit slower than MD since it’s doing a lot more under the covers, but it definitely shouldn’t be 📈🙀
@johno Thanks for the updates here. Here is the second round of testing after pulling down the new packages (benchmark repo: https://github.com/johnatspreadstreet/gatsby-md-vs-mdx):
Looks like ~5% better performance in the upper bucket, with pretty much even performance in the lower buckets.
If you guys need me to test anything additional, or hunches, happy to do so.