question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Occasional empty / zero byte streams at high load

See original GitHub issue

Hi!

First of all, thank you for this great library! I’ve been using it for a couple of years now and it really makes the GraphQL developer experience better 🙂.

Unfortunately, we’re seeing occasional errors where the createReadStream() would have zero bytes on our production environment with very little load running on GKE even though the whole multipart request was received (I’ve added a PassThrough stream to print out the whole request when the stream length is zero).

I have been able to replicate this on a local K8S when load testing the service at 40-60 rps with a chance of around 1 in 5000. Basically, the service in question here is a GraphQL Gateway, we’re using graphql-upload v13.0.0’s processRequest as a middleware for fastify to process the body and then passing it to apollo-server-fastify and @apollo/gateway together with apollo-federation-file-upload as the datasource to replay the file upload.

My initial assumption was that it has something to do with the networking/parsing layer but I had run a separate test using the received request body directly to dicer in a >100,000 loop to see if it was the cause but the result didn’t have any errors parsing the multipart.

I have a couple of theories as to why this happens.

  1. Upon diving with fs-capacitor I have learned that all file uploads create a temporary file so maybe during high load it fails to create a temporary possibly due to too many files being open? I’m currently thinking if this is related to this issue
  2. During debugging, whenever I encounter the zero byte stream the onData and onEnd was never called, I’m not super experienced when it comes to streams but would it be possible for createReadStream() to execute before the data is written to the FileStream?
  3. It may also be the combination of the two? During high load disk writes can cause high latency which affects reading of the file when calling createReadStream.

Our solution right now is to migrate to graphql-upload-minimal since we’re only using the stream to pass through to the receiving backend service. I haven’t encountered the issue so far with this setup even at 120 rps load.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
captainjapengcommented, Jun 7, 2022

I understand, our solution with graphql-upload-minimal seems to be working fine and might be the better route for a gateway service. In the meantime, I’ve attached a patch file to be used with patch-package if there is a need for an immediate fix for other people.

patches/fs-capacitor+6.2.0.patch

diff --git a/node_modules/fs-capacitor/dist/index.js b/node_modules/fs-capacitor/dist/index.js
index 91c6d96..7b4e772 100644
--- a/node_modules/fs-capacitor/dist/index.js
+++ b/node_modules/fs-capacitor/dist/index.js
@@ -48,7 +48,12 @@ class ReadStream extends stream_1.Readable {
             // If there were no more bytes to read and the write stream is finished,
             // than this stream has reached the end.
             if (this._writeStream._writableState.finished) {
-                this.push(null);
+                // Check if we have consumed the whole file up to where
+                // the write stream has written before ending the stream
+                if (this._pos < this._writeStream._pos)
+                    this._read(n);
+                else
+                    this.push(null);
                 return;
             }
             // Otherwise, wait for the write stream to add more data or finish.
1reaction
jaydensericcommented, Jun 10, 2022

It seems the idea to dynamic import pure ESM fs-capacitor is not going to work due a TypeScript issue: https://github.com/microsoft/TypeScript/issues/49055#issuecomment-1151747145

Side note about the dynamic import approach; I had an idea about doing the dynamic import on the first function call, and storing the result in a let outside of the function scope so it can be used from then on instead of awaiting a promise for the dynamic import again and again each function call. But I couldn’t find any information about if Node.js has optimisations for multiple dynamic import calls of the same thing (are calls after the first faster?) or if awaiting a promise really saves that much time or system resources.

Thanks for offering to add a CJS entry point to fs-capacitor, but maybe I should just move graphql-upload to pure ESM and be done with it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Empty (zero-byte) input files error when I run devtools::load_all()
For some reason, when I run devtools::load_all() command within the R package I am developing, I get the following error message:
Read more >
Optimizing garbage collection in a high load .NET service
Here we see cases where the app tries to create a large object (in the .NET Framework, objects larger than 85,000 bytes are...
Read more >
You're Probably Thinking About Redis Streams Wrong
Redis Head of Developer Advocacy Kyle Davis explains how you can leverage Redis Streams' less-understood properties in your applications.
Read more >
Streams and Drainage Systems - Tulane
Ephemeral Streams - Streams that only occasionally have water flowing are called ephemeral streams or dry washes. They are above the water table ......
Read more >
Troubleshooting High CPU Utilization - Cisco
The switch will never report CPU utilization at 0%. There are multiple background IOS processes running on timers that execute multiple times a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found