Infinite loop in connection retry
See original GitHub issueFor some reason (I haven’t been able to reproduce this locally yet), under heavy load some of our okhttp clients might get into an infinite loop (eating up one or more CPU cores) when connection failure retry is enabled. It seems this affects some connections only in the connection pool.
If I disable connection failure retry then a SocketException
is thrown at https://github.com/square/okhttp/blob/master/okhttp/src/main/java/okhttp3/internal/connection/RealConnection.kt#L594 with the following stack trace:
java.net.SocketException: Socket is closed at java.net.Socket.setSoTimeout(Unknown Source)
at sun.security.ssl.BaseSSLSocketImpl.setSoTimeout(Unknown Source)
at sun.security.ssl.SSLSocketImpl.setSoTimeout(Unknown Source)
at okhttp3.internal.connection.RealConnection.newCodec$okhttp(RealConnection.kt:594)
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:83)
at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:245)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:82)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:74)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:197)
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:502)
In this case a certain percentage of the requests fail constantly suggesting a stuck pooled faulty connection which is never evicted.
The only option is to restart the affected application.
I think the cause is related to request cancellation and concurrency. Might be the socket is closed right after creation by a different thread?
This issue has been observed since at least 3.14.0. Also happening with latest: 4.4.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:21 (1 by maintainers)
Top GitHub Comments
Thanks for the release. Haven’t seen the issue with 4.5.0 for a week now.
@danielimre ahh right. I spotted one code path that might explain this. But it’s pretty extraordinary.
When the application gets in this state are any requests making it to the server? Or no requests make it?