question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large memory allocations and very slow execution when type errors occur in complex inputs

See original GitHub issue

Hi,

We have a rather large complex object as an input. A single request with hundreds of type errors in the input took 9 seconds in graphql.execute. The majority of the time spent (>95%) was spent in the inspect() call, specifically its recursive object string construction. There were huge amount of memory allocations. With 20 concurrent requests, we can consistently cause node to run out of memory. Changing inspect() to return an empty string lowered the total execution time of our process to low hundreds of milliseconds and the process was stable. I don’t know the exact latency of graphql.execute with inspect effectively disabled. Given this finding, I am proposing three changes.

  1. inspect() should never be in the execution path as currently written. At the very least, it should not be recursive and/or have a maximum string length. In our case, each inspect grew to be over 100KB.
  2. There should be a maximum amount of errors in getVariableValues(). If the number of errors reaches the maximum, no additional varDefNodes are processed. This grew to be ~1500. Cap at 5?
  3. If inspect() must be kept, there should also be a maximum total amount of error lengths between error messages in getVariableValues(). If the limit is reached, no more varDefNodes should be processed. This would prevent large single errors from multiplying by the cap.

Given that we had ~1500 errors, each error was ~100KB and there were 20 concurrent requests, that is 3GB of memory. That is more than the default max-old-space-size.

Item 2 is an easy fix but 1 and 3 will make the execution path much more robust. We are in an air-gapped environment so I cannot easily provide a reproducible case.

        const properties = Object.keys(value)
          .map(k => `${k}: ${inspect(value[k])}`)
          .join(', ');

graphql 14.1.1 node 8.11.3

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
IvanGoncharovcommented, Mar 4, 2019

how is Facebook running this?

FB has its own GraphQL implementation in HACK (a dialect of PHP) and graphql-js used only for Relay and other client-side tools. Disclaimer: I’m not FB employee, so I may be missing something.

This is really embarrassing,

I am strive to be very sympathetic and patient with open source projects and maintainers.

I’m helping to maintain this project for the last few years mostly in my own free time. Matt from FB helps a lot, but he is also doing it in his own free time.

Shaming maintainers for not dedicating their own free time to your issue is not very productive. As a contractor, I have ongoing projects and my responsibilities to customers so I can’t immediately stop whatever I’m doing and switch to fixing this issue.


I implement partial fix in #1771 and I will look into getVariableValues in the next few days.

1reaction
SoyYoRafacommented, Mar 2, 2019

I am strive to be very sympathetic and patient with open source projects and maintainers. But, I have to agree with @daniele-orlando characterization of the severity. I am highly surprised no emergency fix has been released in 8 days. There must be a lot of vulnerable servers and the code as currently written is simply unacceptable in any serious production environment - how is Facebook running this? I am happy to send a pull request to disable inspect from being recursive.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory allocation errors can be caused by slow page file ...
Memory allocation failures can occur due to latencies that are associated with growing the size of a page file to support additional memory...
Read more >
High-volume troubleshooting
When the OutOfMemoryException error occurs, a very large file that is the entire Java memory space is generated in the root directory of...
Read more >
Resolve "Out of Memory" Errors - MATLAB & Simulink
Troubleshoot errors when MATLAB cannot allocate the requested memory. ... MATLAB has built-in protection against creating arrays that are too large.
Read more >
Troubleshooting Memory Leaks - Oracle Help Center
One common indication of a memory leak is the java.lang.OutOfMemoryError error. This error is thrown when there is insufficient space to allocate an...
Read more >
Troubleshooting Memory Problems - Cisco
A memory leak occurs when a process requests or allocates memory and then forgets to free (de-allocate) the memory when it is finished...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found