[gatsby-source-contentful] downloadLocal broken by gatsby-source-filesystem
See original GitHub issueDescription
#20843 introduced a timeout for createRemoteFileNode
. I’m almost certain this breaks localFile for contentful projects with ~15 or greater assets.
I fixed transitive dependencies on gatsby-source-filesystem
to 2.1.47 (right before #20843) and the issue was fixed.
Steps to reproduce
Attempt using gatsby-source-contentful
with downloadLocal
enabled. If gatsby develop
takes > 30 seconds, createRemoteFileNode
will silently timeout. Build will complete, but most localFile
fields in graphiql will be null
.
Expected result
localFile
fields are populated.
Actual result
localFile
fields are null
Other Notes
I think all of the createRemoteFileNode
calls are actually completing, but the timeout has some nasty side effect.
I’d love to see this reverted as I have to resort to the very hacky npm-force-resolutions
Environment
System: OS: Linux 4.4 Ubuntu 18.04.4 LTS (Bionic Beaver) CPU: (8) x64 Intel® Core™ i7-8550U CPU @ 1.80GHz Shell: 4.4.20 - /bin/bash Binaries: Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node Yarn: 1.22.1 - /usr/bin/yarn npm: 6.13.4 - ~/.nvm/versions/node/v12.16.1/bin/npm Languages: Python: 2.7.17 - /usr/bin/python npmPackages: gatsby: ^2.17.4 => 2.20.20 gatsby-image: ^2.2.30 => 2.3.2 gatsby-plugin-brotli: ^1.3.1 => 1.3.1 gatsby-plugin-emotion: ^4.1.18 => 4.2.1 gatsby-plugin-manifest: ^2.2.41 => 2.3.3 gatsby-plugin-netlify: ^2.1.32 => 2.2.1 gatsby-plugin-postcss: ^2.1.16 => 2.2.1 gatsby-plugin-prefetch-google-fonts: 1.4.3 => 1.4.3 gatsby-plugin-react-helmet: ^3.1.13 => 3.2.2 gatsby-plugin-react-svg: ^3.0.0 => 3.0.0 gatsby-plugin-remove-fingerprints: 0.0.2 => 0.0.2 gatsby-plugin-resolve-src: ^2.0.0 => 2.0.0 gatsby-plugin-sharp: ^2.2.32 => 2.5.4 gatsby-source-contentful: ^2.1.73 => 2.2.7 gatsby-transformer-remote-filesystem: ^0.2.0 => 0.2.0 gatsby-transformer-sharp: ^2.3.0 => 2.4.4 npmGlobalPackages: gatsby-cli: 2.11.8
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:23 (6 by maintainers)
Got the same result here as well:
success Downloading remote files - 30.130s - 56/93 3.09/s
Missing randomlocalFile
data of some images. Download gets cut off right at the 30 seconds mark.I did some further digging:
It seems the timeout is created in the
requestRemoteNode
.https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L152-L157
All the requests promises are created at the same time but the actual requests are only loaded in order. This causes all the requests at the bottom of the stack to time out and fail.
The Timeout error is not handled in the
download-contentful-assets
and therefor fails silently.https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-contentful/src/download-contentful-assets.js#L88-L90
When actually logging the error you get the following trace:
Because the error is not handled the result is still seen as a success even though the website will not run properly due to missing data. So this error needs to be handled appropriately.
As for the request failing, the issue seems to be that too many of them are fired off at the same time. But are only loading in order.
https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L76
It seems
gatsby-source-filesystem
assumes you can download 200 files concurrently. But this might not work with the contentful API? I don’t know what the limit is here. This is however adjustable via an environment variable.Setting the following config seems to fix the timeout issue for me. :
gatsby-config.js
new output:
As for the timeout, 30 seconds is a good default, but might not be enough for larger files (or slow internet). Perhaps this needs to be adjustable if needed. Perhaps also through an environment variable?
@mjmaurer Maybe reopen this issue as more people seem to encounter this problem?
I’ve looked into this a bit and from what I can see there’s a few issues at play here:
The
gatsby-source-contentful
plugin swallows exceptions fromcreateRemoteFileNode
. These will most likely be networking errors if the downloads timeout or something else unexpected happens like a TCP connection being reset.gatsby-source-contentful
then assumes the file has been downloaded successfully when it hasn’t and errors crop up later in the build when null references are hit.The 30s timeout
got
is configured with. It’s possible this will be hit if you’re downloading a large asset and you don’t have the bandwidth to complete the download in 30s. The easiest way to reproduce this is throttling your network connection and running a build. On MacOS, I used the Network Link Conditioner. Note: The asset size limit in Contentful is 1GBThe default number of concurrent downloads in
create-remote-file-node
. The default is 200 and this seems to cause all sorts of problems for me running a local build in a large Contentful space. Since there’s more downloads happening concurrently, a timeout is more likely for any individual file plus I’m also seeing the occasional connection reset before a timeout happens. It’s likely this is less of an issue if you’ve got a high bandwidth connection to Contentful’s asset CDN (aka CloudFront) but I wonder if this is a sensible default from a reliability standpoint. Maybe this could be determined more intelligently, e.g. if network errors are encountered perform some kind of exponential backoff.I’m going to start working on a PR to fix point 1 immediately. I don’t think the Contentful source plugin should ever swallow errors. I would love to get someone’s thoughts on points 2 & 3. Happy to work on these as well.