question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ConnectionPool cleanup is eager and results in unnecessary connection churn and (potentially) port exhaustion

See original GitHub issue

Currently, OkHttp reaps “idle” connections (i.e. connections not in use) if either of the following is true:

  • a connection has not been used in a period greater than the idle timeout, or
  • the number of idle connections is greater than some threshold
      if (longestIdleDurationNs >= this.keepAliveDurationNs
          || idleConnectionCount > this.maxIdleConnections) {
        // We've found a connection to evict. Remove it from the list, then close it below (outside
        // of the synchronized block).
        connections.remove(longestIdleConnection);
      } 

https://github.com/square/okhttp/blob/95ae0cf421c0f9c5521578781952108d1a1e1bdd/okhttp/src/main/java/okhttp3/ConnectionPool.java#L226-L230

This policy favors keeping the number of open connections close to the maxIdleConnections value.

We’ve run into situations internally where we have multiple threads using a single OkHttpClient instance to issue concurrent requests to an upstream (roughly 60 threads), but we observe that many of the connections (and underlying TCP sockets) are closed very soon after their initial use. This eventually leads to a situation where the host may run out of ephemeral ports (as the sockets are placed into TIME_WAIT for a period of time, and the port cannot be reused). Eventually connect() syscalls fail.

One workaround is to set the maxIdleConnections value to some value sufficiently high enough to allow for the connections to be reused in high request-rate environments. However, these connections will not be closed in the case that the request rate falls.

Maybe this is by design? I’d posit that the principle of least surprise would have one assume that a connection is only closed if it has been idle for some extended period of time. Indeed that has caught a few of us by surprise.

An alternative we’d like to propose is that connections are closed only if an eligible connection exists and the maxIdleConnections value has been exceeded. i.e. change the OR to and AND in the code above. This would allow the connection pool to grow with the load during bursts, and to shrink when the connections are not being utilized.

Here’s a small reproducer we put together where we observe the port exhaustion issue: https://github.com/nicktrav/okhttp/commits/nickt.time-wait-issue

This has also been mentioned here, but the comments didn’t seem to point at a resolution: https://stackoverflow.com/questions/41011287/why-okhttp-doesnt-reuse-its-connections

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
swankjessecommented, Nov 2, 2018

We really need to close connections that have exceeded the keep alive duration. We evict these connections not to conserve resources but to prevent failures later caused by use of stale connections.

I think the right fix here is to dramatically increase the maxIdleConnections; perhaps to 1,000 or more. Closing non-stale connections is merely about conserving memory, and memory is cheap! Probably about ~64 KiB per socket, depending on system buffer sizes.

And we should change the default value in OkHttp also. Perhaps a default of 64? That would be 4 MiB of memory if my socket math is right, which seems in the right order of magnitude.

0reactions
swankjessecommented, Nov 17, 2018

Punting to the next release ’cause there’s no code changes here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting intermittent outbound connection errors in ...
The major cause for intermittent connection issues is hitting a limit while making new outbound connections. The limits you can hit include:.
Read more >
Configuring Infinispan caches
Configure Infinispan caches with JDBC string-based cache stores that can connect to databases. Prerequisites. Remote caches: Copy database ...
Read more >
Fix list for IBM WebSphere Application Server V8.5
IBM WebSphere Application Server provides periodic fixes for the base and Network Deployment editions of release V8.5. The following is a complete listing ......
Read more >
Change History | AllegroGraph 6.6.0 - Franz Inc.
Rfe15967 - Eager filter evaluation could lead to missing query results ... Depending on the connection pool configuration, many unnecessary requests could ...
Read more >
Connections - RabbitMQ
When a connection is no longer necessary, applications must close them to conserve resources. Apps that fail to do it run the risk...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found