question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Zuul leaves some connections in CLOSE_WAIT state for further reuse, but some never get reused and stay in that state forever, blocking further requests

See original GitHub issue

I have a Zuul server that proxies all my requests to autodiscovered (via Eureka) routes.

This works fine most of the time. However, I have noticed some very odd behaviour the occurs sporadically and can only partially be recreated.

After making multiple simultaneous requests, for example the swagger-ui.html page for a given API description, which loads not only the page itself but also numerous webjars and resources, some connections end up in a CLOSE_WAIT state.

tcp6       1      0 host:54470 host:37612 CLOSE_WAIT  user      425593599   4396/java
tcp6       1      0 host:57724 host:37612 CLOSE_WAIT  user      426384390   4396/java
tcp6       1      0 host:59402 host:52887 CLOSE_WAIT  user      425517966   4396/java
tcp6       1      0 host:59403 host:52887 CLOSE_WAIT  user      425489000   4396/java
tcp6       1      0 host:59404 host:52887 CLOSE_WAIT  user      425518687   4396/java
tcp6       1      0 host:59405 host:52887 CLOSE_WAIT  user      425469338   4396/java
tcp6       1      0 host:59406 host:52887 CLOSE_WAIT  user      425518688   4396/java
tcp6       1      0 host:59407 host:52887 CLOSE_WAIT  user      425476214   4396/java
tcp6       1      0 host:60118 host:37612 CLOSE_WAIT  user      426773630   4396/java
tcp6       1      0 host:60154 host:37612 CLOSE_WAIT  user      426810662   4396/java
tcp6       1      0 host:60155 host:37612 CLOSE_WAIT  user      426824573   4396/java
tcp6       1      0 host:60156 host:37612 CLOSE_WAIT  user      426821100   4396/java
tcp6       1      0 host:60157 host:37612 CLOSE_WAIT  user      426825547   4396/java
tcp6       1      0 host:60158 host:37612 CLOSE_WAIT  user      426820353   4396/java
tcp6       1      0 host:60159 host:37612 CLOSE_WAIT  user      426618721   4396/java
tcp6       1      0 host:60160 host:37612 CLOSE_WAIT  user      426802727   4396/java
tcp6       1      0 host:60161 host:37612 CLOSE_WAIT  user      426825548   4396/java
tcp6       1      0 host:60162 host:37612 CLOSE_WAIT  user      426824574   4396/java
tcp6       1      0 host:60163 host:37612 CLOSE_WAIT  user      426618722   4396/java
tcp6       1      0 host:60167 host:37612 CLOSE_WAIT  user      426689993   4396/java
tcp6       1      0 host:60168 host:37612 CLOSE_WAIT  user      426618745   4396/java
tcp6       1      0 host:60169 host:37612 CLOSE_WAIT  user      426796620   4396/java
tcp6       1      0 host:60170 host:37612 CLOSE_WAIT  user      426824617   4396/java
tcp6       1      0 host:60171 host:37612 CLOSE_WAIT  user      426827273   4396/java

The 4396 process in this case is my IDE with which I was debugging the Zuul server. When I perform another refresh of the same browser site, many of the connections are successfully closed, though some more pop up after a while. The behaviour also happens, although less frequently, when making numerous cURL requests to any given route.

I dug around in the SimpleHostRoutingFilter which uses a PoolingHttpClientConnectionManager and noticed something peculiar:

  • The TTL of the default configuration is set to -1, i.e. infinite
  • The connections that are in CLOSE_WAIT get reused for establishing new connections in line 318 of PoolingHttpClientConnectionManager (something which I find extremely odd, but I am unsure if this might be a standard Java approach)

However, there are some connections that live on eternally in a CLOSE_WAIT state that I cannot get rid of. The other end of the route does not have any open connections still lying around - it is merely the Zuul which is not successfully closing the connections in the CLOSE_WAIT state.

Eventually these connections clog up the pool and I stop getting responses from my services altogether, and I have not seen Zuul clean them up even after >1 day.

What is odd as well is that, the cap seems to be 50 connections although the maxPerRoute parameter is set to 20.

Is this a known issue? Is there a workaround known? I was planning to subclass/replace the SimpleHostRoutingFilter with my own and pass it a connection pool manager configuration with some TTL value to see if there could be any improvements, but I thought I should first ask if this is a known issue seeing how the effort required is non-trivial.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:23 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
spencergibbcommented, Apr 20, 2017

we’ll talk about it tomorrow morning.

0reactions
ryanjbaxtercommented, Apr 21, 2017

@tkvangorder we most likely try to get the release done next week.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Zuul leaves some connections in CLOSE_WAIT state for ...
Zuul leaves some connections in CLOSE_WAIT state for further reuse, but some never get reused and stay in that state forever, blocking ......
Read more >
Troubleshooting connections stuck in CLOSE_WAIT status
CLOSE_WAIT is the state the local TCP state machine is in when the remote host sends a FIN (closes it's connection) but the...
Read more >
This is strictly a violation of the TCP specification
The listening application leaks sockets, they are stuck in CLOSE_WAIT TCP state forever. These sockets look like (127.0.0.1:5000, 127.0.0.1:some ...
Read more >
WebSphere Application Server HTTP plug-in has lingering ...
If any of the known connections are found to be in an unusable state like close_wait, the connection(s) will be reset. When there...
Read more >
linux - Orphaned connections in CLOSE_WAIT state
No, there is no timeout for CLOSE_WAIT . I think that's what the off means in your output. To get out of CLOSE_WAIT...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found