question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Infinite loop in connection retry

See original GitHub issue

For some reason (I haven’t been able to reproduce this locally yet), under heavy load some of our okhttp clients might get into an infinite loop (eating up one or more CPU cores) when connection failure retry is enabled. It seems this affects some connections only in the connection pool.

If I disable connection failure retry then a SocketException is thrown at https://github.com/square/okhttp/blob/master/okhttp/src/main/java/okhttp3/internal/connection/RealConnection.kt#L594 with the following stack trace:

java.net.SocketException: Socket is closed at java.net.Socket.setSoTimeout(Unknown Source) 
at sun.security.ssl.BaseSSLSocketImpl.setSoTimeout(Unknown Source) 
at sun.security.ssl.SSLSocketImpl.setSoTimeout(Unknown Source) 
at okhttp3.internal.connection.RealConnection.newCodec$okhttp(RealConnection.kt:594) 
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:83) 
at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:245) 
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32) 
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100) 
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:82) 
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100) 
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83) 
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100) 
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:74) 
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100) 
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:197)
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:502)

In this case a certain percentage of the requests fail constantly suggesting a stuck pooled faulty connection which is never evicted.

The only option is to restart the affected application.

I think the cause is related to request cancellation and concurrency. Might be the socket is closed right after creation by a different thread?

This issue has been observed since at least 3.14.0. Also happening with latest: 4.4.0

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:21 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
danielimrecommented, Apr 20, 2020

Thanks for the release. Haven’t seen the issue with 4.5.0 for a week now.

1reaction
dave-r12commented, Mar 31, 2020

@danielimre ahh right. I spotted one code path that might explain this. But it’s pretty extraordinary.

When the application gets in this state are any requests making it to the server? Or no requests make it?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Infinite loop in DB load balancer retry logic - GitLab.org
Run User.first or any command that runs a query so that a connection is checked out for the thread. · Stop / restart...
Read more >
Process enters into infinite retry loop after XSMG603I ...
Cause. The run task script (ftp job) hung. Connect Direct continued to timeout per runstep.max.time.to.wait and retried.
Read more >
Retry on Mono seems to generate an infinite loop
Your retry() method just retries the publisher returned by dbService.dbThingErrorSometimes() , which might be a Mono.error() . dbService.
Read more >
Application enters an infinite loop upon disconnecting from a ...
Application enters an infinite loop, after disconnecting from a database. Result of the infinite loop is to generate 10s of GB of log...
Read more >
Mongodb connection infinite loop
To me, the most likely culprit is that the SSL connection isn't set up properly, so connection fails, and your code is retrying...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found