[BUG] 429 too many requests - heap issue
See original GitHub issueDescribe the bug The issue is extremely similar to #1583, the symptoms are the same, i.e. after an amount of time, we get 429 response code, too many requests. However, the return data includes:
(apologies for the low res…) You can see:
data too large, data for... which is larger than the limit of ....
That tipped us off since it ties the issue to JVM heap memory. In fact, on restart the issue is cleared. Looking at our monitoring, the JVM heap memory shows a clear leak:
The y-axis shows % heap used. The massive dip was on restart of opensearch. Both before and after the restart there’s a subtle but clear leak.
To Reproduce Difficult to reproduce. We’re not doing anything special in terms of setup. The setup is:
- baremetal opensearch
- single server
- indexing done via bulk API index calls
- high write activity, low read activity (we index way more than we query)
- more details will be posted in a comment below
Expected behavior No heap memory leak
Plugins All default plugins except “security” and “ilm”, more details in comment below
Host/Environment (please complete the following information):
- OS: Centos7
- Version opensearch 1.2.3
Additional context There doesnt seem to be a high queue or thread count when the problem starts. We modified the JVM XMX and XMS arguments from 30Gb to 37GB, with not much difference
Issue Analytics
- State:
- Created a year ago
- Comments:7
Top GitHub Comments
@amitmun the server displaying these symptoms is in production so it’s difficult to reproduce properly. What we have noticed is that “closing” older indices using the close api seems to keep this under control…
We see exactly the same behavoir with OpenSearch 1.2.4 on RHEL and currently working on creating an environment to recreate this in order to try and resolve it. Everything works for a while until GC starts to go wild and almost all requests fail (including simple requests like getting a single doc by ID). Restarting is the only workaround we currently have. @dvas0004 - have you found out the root cause for this issue yet? If we successfully recreate I can share heap dumps and heap histo and test solutions like upgrading OpenSearch etc.