question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[gatsby-source-contentful] downloadLocal broken by gatsby-source-filesystem

See original GitHub issue

Description

#20843 introduced a timeout for createRemoteFileNode. I’m almost certain this breaks localFile for contentful projects with ~15 or greater assets.

I fixed transitive dependencies on gatsby-source-filesystem to 2.1.47 (right before #20843) and the issue was fixed.

Steps to reproduce

Attempt using gatsby-source-contentful with downloadLocal enabled. If gatsby develop takes > 30 seconds, createRemoteFileNode will silently timeout. Build will complete, but most localFile fields in graphiql will be null.

Expected result

localFile fields are populated.

Actual result

localFile fields are null

Other Notes

I think all of the createRemoteFileNode calls are actually completing, but the timeout has some nasty side effect.

I’d love to see this reverted as I have to resort to the very hacky npm-force-resolutions

Environment

System: OS: Linux 4.4 Ubuntu 18.04.4 LTS (Bionic Beaver) CPU: (8) x64 Intel® Core™ i7-8550U CPU @ 1.80GHz Shell: 4.4.20 - /bin/bash Binaries: Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node Yarn: 1.22.1 - /usr/bin/yarn npm: 6.13.4 - ~/.nvm/versions/node/v12.16.1/bin/npm Languages: Python: 2.7.17 - /usr/bin/python npmPackages: gatsby: ^2.17.4 => 2.20.20 gatsby-image: ^2.2.30 => 2.3.2 gatsby-plugin-brotli: ^1.3.1 => 1.3.1 gatsby-plugin-emotion: ^4.1.18 => 4.2.1 gatsby-plugin-manifest: ^2.2.41 => 2.3.3 gatsby-plugin-netlify: ^2.1.32 => 2.2.1 gatsby-plugin-postcss: ^2.1.16 => 2.2.1 gatsby-plugin-prefetch-google-fonts: 1.4.3 => 1.4.3 gatsby-plugin-react-helmet: ^3.1.13 => 3.2.2 gatsby-plugin-react-svg: ^3.0.0 => 3.0.0 gatsby-plugin-remove-fingerprints: 0.0.2 => 0.0.2 gatsby-plugin-resolve-src: ^2.0.0 => 2.0.0 gatsby-plugin-sharp: ^2.2.32 => 2.5.4 gatsby-source-contentful: ^2.1.73 => 2.2.7 gatsby-transformer-remote-filesystem: ^0.2.0 => 0.2.0 gatsby-transformer-sharp: ^2.3.0 => 2.4.4 npmGlobalPackages: gatsby-cli: 2.11.8

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:5
  • Comments:23 (6 by maintainers)

github_iconTop GitHub Comments

19reactions
GrtDevcommented, Apr 25, 2020

Got the same result here as well: success Downloading remote files - 30.130s - 56/93 3.09/s Missing random localFile data of some images. Download gets cut off right at the 30 seconds mark.

I did some further digging:

It seems the timeout is created in the requestRemoteNode.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L152-L157

All the requests promises are created at the same time but the actual requests are only loaded in order. This causes all the requests at the bottom of the stack to time out and fail.

The Timeout error is not handled in the download-contentful-assets and therefor fails silently.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-contentful/src/download-contentful-assets.js#L88-L90

When actually logging the error you get the following trace:

failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
...
...
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
success Downloading remote files - 30.618s - 39/93 3.04/s

Because the error is not handled the result is still seen as a success even though the website will not run properly due to missing data. So this error needs to be handled appropriately.

As for the request failing, the issue seems to be that too many of them are fired off at the same time. But are only loading in order.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L76

It seems gatsby-source-filesystem assumes you can download 200 files concurrently. But this might not work with the contentful API? I don’t know what the limit is here. This is however adjustable via an environment variable.

Setting the following config seems to fix the timeout issue for me. :

gatsby-config.js

process.env.GATSBY_CONCURRENT_DOWNLOAD = 1

new output:

success Downloading remote files - 137.409s - 93/93 0.68/s

As for the timeout, 30 seconds is a good default, but might not be enough for larger files (or slow internet). Perhaps this needs to be adjustable if needed. Perhaps also through an environment variable?

@mjmaurer Maybe reopen this issue as more people seem to encounter this problem?

6reactions
shanekenneycommented, May 20, 2020

I’ve looked into this a bit and from what I can see there’s a few issues at play here:

  1. The gatsby-source-contentful plugin swallows exceptions from createRemoteFileNode. These will most likely be networking errors if the downloads timeout or something else unexpected happens like a TCP connection being reset. gatsby-source-contentful then assumes the file has been downloaded successfully when it hasn’t and errors crop up later in the build when null references are hit.

  2. The 30s timeout got is configured with. It’s possible this will be hit if you’re downloading a large asset and you don’t have the bandwidth to complete the download in 30s. The easiest way to reproduce this is throttling your network connection and running a build. On MacOS, I used the Network Link Conditioner. Note: The asset size limit in Contentful is 1GB

  3. The default number of concurrent downloads in create-remote-file-node. The default is 200 and this seems to cause all sorts of problems for me running a local build in a large Contentful space. Since there’s more downloads happening concurrently, a timeout is more likely for any individual file plus I’m also seeing the occasional connection reset before a timeout happens. It’s likely this is less of an issue if you’ve got a high bandwidth connection to Contentful’s asset CDN (aka CloudFront) but I wonder if this is a sensible default from a reliability standpoint. Maybe this could be determined more intelligently, e.g. if network errors are encountered perform some kind of exponential backoff.

I’m going to start working on a PR to fix point 1 immediately. I don’t think the Contentful source plugin should ever swallow errors. I would love to get someone’s thoughts on points 2 & 3. Happy to work on these as well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

gatsby-source-contentful
When you set downloadLocal: true in your config, the plugin will download and cache Contentful assets to the local filesystem. There are two...
Read more >
Using Contentful and Gatsby Together - Learn With Jason
In this episode, Khaled Garbaya will teach us how to use Contentful to power Gatsby sites. DemoSource Code. Topics. Gatsby · Headless CMS...
Read more >
How to setup Gatsby with Contentful CMS and Deploy on Netlify
In this tutorial, you'll learn how to set up Gatsby with Contentful CMS (Content ... The gatsby-source-contentful allows us to pull our contents...
Read more >
Gatsby Changelog | 5.3.0
import { GatsbyImage } from "gatsby-plugin-image" import * as React from "react" import { useContentfulImage } from "gatsby-source-contentful/hooks" const ...
Read more >
How to retrieve asset in Gatsby Source Contentful? (not images)
Have you tried enabling the option downloadLocal configuration option? This will download the asset locally and should provide a valid URL ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found