Intermittent slow requests from NEST
See original GitHub issueNEST/Elasticsearch.Net version: 7.13.2
Elasticsearch version: 7.16.1
.NET runtime version: .NET 5.0
Operating system version: Debian GNU/Linux 10 (buster)
Description of the problem including expected versus actual behavior: More than 99% of queries of all sorts (search, scroll, get document) run in <100ms, usually a lot less than that. Occassionally however, the NEST client takes longer, sometimes a lot longer - 1 second, 3 seconds, or more.
Slowlog is configured on the server to log anything above 0.1 sec, and even in cases where NEST reports a HTTP request to elasticsearch taking >3sec, nothing is logged, so I believe it’s client-side.
The client and server are on the same Kubernetes cluster. CPU and RAM use is low on both client and server.
I’ve monitored for thread pool exhaustion, and at one point saw a thread pool queue size go high for a delay, but added a SetMinThreads
call, and not seen much of a queue size since, but the issue persists.
Here’s a recent example:
I went so far as to capture a packet trace, which shows a different example - whereupon NEST took about 1 second to do the request, but nothing on the slowlog
:
I believe this shows the client making a TCP connection, but then waiting for almost a second before sending the request.
I’ll admit I’m not au fait with the NEST/ES.NET codebase, so given an hour or so of digging, I couldn’t make my way to find where it is that HTTP requests are issued! So I can’t understand if this even could be a bug in NEST/ES.NET, or if it’s a .NET thing.
I’ve also (I think) ruled out network connectivity by running a bash script on the same host to replay a request which triggered this issue via NEST to the server in a tight loop. After thousands of iterations, nothing took more than 100ms.
Steps to reproduce: Currently, this occurs intermittently in the test build of our app - with only one user but an intermittent workload profile similar to the above kibana screenshot.
Expected behavior Requests don’t experience random pauses.
Provide ConnectionSettings
(if relevant):
Provide DebugInformation
(if relevant):
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Re the packet trace, it shows that a HTTP connection was started and established quickly (0.7ms), then the delay was with the client, i.e. >940ms of waiting. The “0” time is just when the first packet to start establishing the TCP connection occurred. To rephrase the packet trace:
So, the delay is all client-side.
I’ll try using
dotnet-trace
to listen forDiagnosticSource
events… thanks!Closing as stale.