question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Intermittent slow requests from NEST

See original GitHub issue

NEST/Elasticsearch.Net version: 7.13.2

Elasticsearch version: 7.16.1

.NET runtime version: .NET 5.0

Operating system version: Debian GNU/Linux 10 (buster)

Description of the problem including expected versus actual behavior: More than 99% of queries of all sorts (search, scroll, get document) run in <100ms, usually a lot less than that. Occassionally however, the NEST client takes longer, sometimes a lot longer - 1 second, 3 seconds, or more.

Slowlog is configured on the server to log anything above 0.1 sec, and even in cases where NEST reports a HTTP request to elasticsearch taking >3sec, nothing is logged, so I believe it’s client-side.

The client and server are on the same Kubernetes cluster. CPU and RAM use is low on both client and server.

I’ve monitored for thread pool exhaustion, and at one point saw a thread pool queue size go high for a delay, but added a SetMinThreads call, and not seen much of a queue size since, but the issue persists.

Here’s a recent example:

image

I went so far as to capture a packet trace, which shows a different example - whereupon NEST took about 1 second to do the request, but nothing on the slowlog:

image

I believe this shows the client making a TCP connection, but then waiting for almost a second before sending the request.

I’ll admit I’m not au fait with the NEST/ES.NET codebase, so given an hour or so of digging, I couldn’t make my way to find where it is that HTTP requests are issued! So I can’t understand if this even could be a bug in NEST/ES.NET, or if it’s a .NET thing.

I’ve also (I think) ruled out network connectivity by running a bash script on the same host to replay a request which triggered this issue via NEST to the server in a tight loop. After thousands of iterations, nothing took more than 100ms.

Steps to reproduce: Currently, this occurs intermittently in the test build of our app - with only one user but an intermittent workload profile similar to the above kibana screenshot.

Expected behavior Requests don’t experience random pauses.

Provide ConnectionSettings (if relevant):

Provide DebugInformation (if relevant):

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
kierenjcommented, Mar 28, 2022

Re the packet trace, it shows that a HTTP connection was started and established quickly (0.7ms), then the delay was with the client, i.e. >940ms of waiting. The “0” time is just when the first packet to start establishing the TCP connection occurred. To rephrase the packet trace:

  1. Client sends first SYN packet to establish connection with server
  2. (0.7ms passes)
  3. Server responds back to client with SYN/ACK, to say connection will be accepted
  4. (0.02ms passes)
  5. Client sends ACK back to server. At this point, TCP connection is established and both sides know it
  6. (945ms inexplicably passes)
  7. Client sends HTTP request to server

So, the delay is all client-side.

I’ll try using dotnet-trace to listen for DiagnosticSource events… thanks!

0reactions
stevejgordoncommented, Apr 11, 2023

Closing as stale.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot delays in streaming video - Google Nest Help
Try closing some apps, including the Home app or the Nest app, and then re-open only the app you need to view your...
Read more >
Nest Wifi Pro Issues : r/GoogleWiFi
Upgraded to Nest Wifi Pro this week and have had intermittent disconnects every few hours. Talked to support and they suggested factory reset...
Read more >
Re: Nest Wifi - Devices Slow or Drop - Page 10
I have to keep restarting my Nest WIFI Network (2 to 3 times per day) through app and that usually fixes the issue....
Read more >
Very slow method nest.js - javascript
I have a back on nest.js and mysql db, when authorization request is very long because of the large user model and adjacent...
Read more >
Nest Cameras have SERIOUS ISSUES with latest UniFi Wi ...
They become completely uncontactable and need to wait either ~15 minutes for them to be kicked by the AP and reconnect OR manually...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found