question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

17K pages from JSON files takes hours to build

See original GitHub issue

Summary

I’m building about 17k pages from 17k pre-generated JSON files (each file contains the data needed to build a page). The build takes > 8 hours to finish. I have discovered that using the same setup, building 500 pages takes less than 2 minutes. Building 7000 pages will take about 40 minutes. I have also tried to switch off the type inference for SitePage.context as suggested in this official documentation but it didn’t help either. Please share your thoughts if there’s anything you think I should fix to fix the build speed issue. Appreciate your help!

Relevant information

System output from a build:

success createPages - 2123.900s
success createPagesStatefully - 4.542s
success updating schema - 1.704s
success onPreExtractQueries - 0.024s
success extract queries from components - 9.105s
success write out redirect data - 0.027s
success Build manifest and related icons - 1.233s
success onPostBootstrap - 1.729s
info bootstrap finished - 2239.531s
success run page queries - 264.055s - 17324/17324 65.61/s
success write out requires - 1.744s
success Building production JavaScript and CSS bundles - 157.205s
success Building static HTML for pages - 836.869s - 17324/17324 20.70/s
⠋ onPostBuild  <-- this step could take a few hours

Environment (if relevant)


  System:
    OS: macOS 10.15.5
    CPU: (8) x64 Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
    Shell: 3.2.57 - /bin/bash
  Binaries:
    Node: 10.17.0 - /usr/local/opt/nvm/versions/node/v10.17.0/bin/node
    Yarn: 1.22.4 - /usr/local/opt/nvm/versions/node/v10.17.0/bin/yarn
    npm: 6.11.3 - /usr/local/opt/nvm/versions/node/v10.17.0/bin/npm
  Languages:
    Python: 2.7.15 - /usr/local/bin/python
  Browsers:
    Chrome: 84.0.4147.135
    Firefox: 72.0.2
    Safari: 13.1.1
  npmPackages:
    gatsby: 2.23.12 => 2.23.12 
    gatsby-image: 2.4.9 => 2.4.9 
    gatsby-plugin-manifest: 2.4.14 => 2.4.14 
    gatsby-plugin-offline: 3.2.13 => 3.2.13 
    gatsby-plugin-react-helmet: 3.3.6 => 3.3.6 
    gatsby-plugin-react-helmet-canonical-urls: ^1.4.0 => 1.4.0 
    gatsby-plugin-react-leaflet: ^2.0.13 => 2.0.13 
    gatsby-plugin-remove-trailing-slashes: ^2.3.11 => 2.3.11 
    gatsby-plugin-sharp: 2.6.14 => 2.6.14 
    gatsby-plugin-sitemap: 2.3.6 => 2.3.6 
    gatsby-source-filesystem: 2.3.14 => 2.3.14 
    gatsby-transformer-sharp: 2.5.7 => 2.5.7 
  npmGlobalPackages:
    gatsby-cli: 2.12.62

File contents (if changed)

gatsby-config.js:

module.exports = {
  siteMetadata: {
    title: ``,
    description: ``,
    author: ``,
    siteUrl: `https://www.my-site.com`,
  },
  plugins: [
    `gatsby-plugin-react-helmet`,
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `images`,
        path: `${__dirname}/src/images`,
      },
    },
    `gatsby-plugin-react-helmet`,
    {
      resolve: `gatsby-plugin-react-helmet-canonical-urls`,
      options: {
        siteUrl: `https://www.my-site.com`,
        noTrailingSlash: true
      },
    },
    `gatsby-transformer-sharp`,
    `gatsby-plugin-sharp`,
    {
      resolve: `gatsby-plugin-manifest`,
      options: {
        name: `gatsby-starter-default`,
        short_name: `starter`,
        start_url: `/`,
        background_color: `#663399`,
        theme_color: `#663399`,
        display: `minimal-ui`,
        icon: `src/images/color_logo_no_padding.png`, // This path is relative to the root of the site.
      },
    },
    {
      resolve: 'gatsby-plugin-react-leaflet',
      options: {
        linkStyles: false // (default: true) Enable/disable loading stylesheets via CDN
      }
    },
    {
      resolve: `gatsby-plugin-sitemap`,
      options: {
        createLinkInHead: false,
        sitemapSize: 1000,
        exclude: ['/redirect', '/search'],
        output: `/sitemap.xml`,
        query: `
          {
            site {
              siteMetadata {
                siteUrl
              }
            }
            allSitePage {
              nodes {
                path
              }
            }
        }`,
        resolveSiteUrl: ({site, allSitePage}) => {
          return site.siteMetadata.siteUrl
        },
        serialize: ({ site, allSitePage }) =>
          allSitePage.nodes.map(node => {
            return {
              url: `${site.siteMetadata.siteUrl}${node.path}`,
              changefreq: `daily`,
              lastmod: new Date().toISOString().split('T')[0], // adds the lastmod entry with a date either parsed or today
              priority: 0.9,
            }
          })
      },
    },
    `gatsby-plugin-remove-trailing-slashes`,
    // this (optional) plugin enables Progressive Web App + Offline functionality
    // To learn more, visit: https://gatsby.dev/offline
    // `gatsby-plugin-offline`,
  ],
}

package.json:

{
  "name": "my-site",
  "private": true,
  "description": "my website",
  "version": "0.1.0",
  "author": "Allan",
  "dependencies": {
    "@babel/runtime": "7.10.5",
    "@reach/router": "^1.3.4",
    "gatsby": "2.23.12",
    "gatsby-image": "2.4.9",
    "gatsby-plugin-manifest": "2.4.14",
    "gatsby-plugin-offline": "3.2.13",
    "gatsby-plugin-react-helmet": "3.3.6",
    "gatsby-plugin-react-helmet-canonical-urls": "^1.4.0",
    "gatsby-plugin-react-leaflet": "^2.0.13",
    "gatsby-plugin-remove-trailing-slashes": "^2.3.11",
    "gatsby-plugin-sharp": "2.6.14",
    "gatsby-source-filesystem": "2.3.14",
    "gatsby-transformer-sharp": "2.5.7",
    "leaflet": "1.6.0",
    "moment": "2.27.0",
    "places.js": "1.19.0",
    "prop-types": "^15.7.2",
    "query-string": "^6.13.1",
    "react": "16.13.1",
    "react-dates": "^21.8.0",
    "react-dom": "16.13.1",
    "react-ga": "^3.1.2",
    "react-helmet": "6.1.0",
    "react-icons": "3.10.0",
    "react-leaflet": "2.7.0",
    "react-with-direction": "1.3.1"
  },
  "devDependencies": {
    "csv-parse": "4.8.8",
    "fs-extra": "9.0.0",
    "gatsby-plugin-sitemap": "2.3.6",
    "prettier": "2.0.5",
    "rimraf": "3.0.2",
    "sitemap": "6.2.0",
    "sync-request": "6.1.0"
  },
  "keywords": [
    "gatsby"
  ],
  "license": "0BSD",
  "scripts": {
    "regen": "sh ./scripts/regen.sh",
    "cms-build-incremental": "node --max-old-space-size=10240 cms/index.js incremental",
    "build": "export NODE_OPTIONS=--max_old_space_size=10240 && gatsby build",
    "develop": "export NODE_OPTIONS=--max_old_space_size=10240 && gatsby develop",
    "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
    "start": "npm run develop",
    "build-serve": "export NODE_OPTIONS=--max_old_space_size=10240 && gatsby build && gatsby serve",
    "serve": "export NODE_OPTIONS=--max_old_space_size=10240 && gatsby serve",
    "clean": "gatsby clean",
    "test": "echo \"Write tests! -> https://gatsby.dev/unit-testing\" && exit 1"
  },
  "repository": {
    "type": "git",
    "url": "https://github.com/gatsbyjs/gatsby-starter-default"
  },
  "bugs": {
    "url": "https://github.com/gatsbyjs/gatsby/issues"
  }
}

gatsby-node.js:


const path = require('path');
const fs = require('fs');

// disable source code map.
exports.onCreateWebpackConfig = ({ actions, stage }) => {
    // If production JavaScript and CSS build
    if (stage === 'build-javascript') {
      // Turn off source maps
      actions.setWebpackConfig({
        devtool: false,
      })
    }
  };

// speed up build
exports.createSchemaCustomization = ({ actions }) => {
    actions.createTypes(`
      type SitePage implements Node @dontInfer {
        path: String!
      }
    `)
  }

exports.createPages = ({ boundActionCreators }) => {
    const { createPage } = boundActionCreators;
    
    // read a file which contains a array of ~17k json file names that will be used to build each page
    const layoutFile = fs.readFileSync('./cms/build/layoutdir.json', 'utf8');
    const layoutDir = JSON.parse(layoutFile);
    
    // Pick up template
    const template = path.resolve(`./src/templates/template.js`);
   
    // for each json file, read the content and use it as pageData (~17k files in total)
    for (var i = 0; i < layoutDir.length; i++){
        var layoutFilePath = layoutDir[i];

        const data = fs.readFileSync(layoutFilePath, 'utf8');
        const pageData = JSON.parse(data);
        const route = "/" + pageData.destinationFile;

        createPage({
            path: route,
            component: template,

            context: {
                pageData: pageData
            }
        });
    }
}

gatsby-browser.js:

// make sure every click brings user to the top of new page.
exports.shouldUpdateScroll = ({
    routerProps: { location },
    getSavedScrollPosition,
  }) => {
    return false;
}

gatsby-ssr.js: N/A

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:22 (12 by maintainers)

github_iconTop GitHub Comments

4reactions
KyleAMathewscommented, Aug 30, 2020

Try turning off the sitemap plugin? That seems to be what’s going really slowly in onPostBuild.

You’re createPages call seems very slow as well — you could add some console.time calls there to see which parts are slow. Moving from fs sync calls could help as that stops everything else while the file read is happening. The fastest way to do this is probably to create a queue of all the files and then process them with concurrency of 30-50 (test different options). The queue we use inside Gatsby is https://www.npmjs.com/package/better-queue which works pretty well.

2reactions
adonigcommented, Sep 24, 2020

I wonder what’s going on in the createPages step. 7 minutes is quite excessive for this size.

@pvdz Thank you for making me look into that! It was a not-so-smart loop firing a graphql query for each of the ~30k products to determine a small bunch of “similar” products from the same product category. I thought about it twice and found a way to reuse the graphql results from the category page creation step to create a category-to-products mapping in memory. That way it was possible to get that calculation down from 5 minutes to 20 seconds 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Processing large JSON files in Python without running out of ...
The original file we loaded is 24MB. Once we load it into memory and decode it into a text (Unicode) Python string, it...
Read more >
java - JSON order mixed up - Stack Overflow
I understand that JSON doesn't have an order and a library is free to generate any order it feels like, but there's a...
Read more >
How Big is TOO BIG for JSON? - Josh Zeigler
Even if the browser could handle this amount of data, which I highly doubt even modern browsers could, the first challenge would be...
Read more >
Christmas at Gaylord National: Christmas Events near ...
Celebrate Christmas in National Harbor, MD, with the holiday events, activities and traditions that make up Christmas at Gaylord National.
Read more >
2022 Fire Season Outlook - CAL FIRE
We make every effort to provide accurate and complete information, however the data is subject to review and change. This site provides general...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found