Indefinite blocks on NativeCrypto.SSL_do_handshake
See original GitHub issueWe have user reports from Google Cloud Dataproc that threads/tasks would intermittently hang on NativeCrypto.SSL_do_handshake. There were some Java and native thread dumps shared in GoogleCloudDataproc/hadoop-connectors/issues/153, but we were unable to produce a solid reproduction so far.
Dataproc currently uses Conscrypt 1.4.2 on java-8-openjdk-amd64, and on the OS side we have heard from both Debian 9 (Debian-provided OpenJDK) and Debian 10 (AdoptOpenJDK). One user shared with GCP Support that they still saw the same hang even after manually updating to Conscrypt 2.4.0.
Below is an example of the thread dump.
"gcsfs-batch-helper-3912" #4130 daemon prio=5 os_prio=0 tid=0x00007f92a4040800 nid=0x3130 runnable [0x00007f9330815000]
java.lang.Thread.State: RUNNABLE
at org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:392)
at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225)
at org.conscrypt.ConscryptFileDescriptorSocket.waitForHandshake(ConscryptFileDescriptorSocket.java:474)
at org.conscrypt.ConscryptFileDescriptorSocket.getOutputStream(ConscryptFileDescriptorSocket.java:461)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
- locked <0x0000000651a1d8a8> (a sun.net.www.protocol.https.HttpsClient)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.execute(BatchHelper.java:175)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.lambda$queue$0(BatchHelper.java:163)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper$$Lambda$96/631118845.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Wikipedia:Blocking policy
Blocks may be applied to user accounts, to IP addresses, and to ranges of IP addresses, for either a definite or an indefinite...
Read more >Soulcalibur Wiki:Blocking - Fandom
Indefinite blocks. Repeated or especially severe rules violations, or the potential of a compromised account, may result in an indefinite block against a...
Read more >Nookipedia:Block policy - Animal Crossing Wiki
Blocks may be applied to user accounts, to IP addresses, and to ranges of IP addresses, and can be either temporary or indefinite....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Issue is still there in Conscrypt 2.5.1
“pool-7-thread-4” #33 prio=5 os_prio=0 tid=0x00007f3b383f3800 nid=0x35fe runnable [0x00007f3b3ca4e000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(ConscryptEngineSocket.java:920) at org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:884) at org.conscrypt.ConscryptEngineSocket$SSLInputStream.access$100(ConscryptEngineSocket.java:706) at org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:230) at org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:209) - locked <0x00000000bc87ad60> (a java.lang.Object) at org.conscrypt.ConscryptEngineSocket.waitForHandshake(ConscryptEngineSocket.java:547) at org.conscrypt.ConscryptEngineSocket.getOutputStream(ConscryptEngineSocket.java:290) at sun.net.www.http.HttpClient.openServer(HttpClient.java:465) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) - locked <0x00000000bc87ad70> (a sun.net.www.protocol.https.HttpsClient) at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:167) at net.elastica.discoveryutil.ServiceDiscovery.getHttpsConnection(ServiceDiscovery.java:169) at net.elastica.discoveryutil.ServiceDiscovery.getServiceFromIP(ServiceDiscovery.java:183) at net.elastica.audit.ServiceIpToServiceIdResolverJob$ServiceMap$1.run(ServiceIpToServiceIdResolverJob.java:297) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
I think I’ve found the root cause. It requires a careful reading of the
sun.net.www
JDK classes and how they interact with Conscrypt classes. For reference, here is the Java version used in my tests:All of the following stack traces and code walkthrough will refer to OpenJDK 8u322 and Conscrypt 2.5.2.
Following are stack traces from running a simple test harness that uses
HttpsURLConnection
with a 20-second read timeout. To simulate a network fault, I’m applying aniptables
rule on connections to a GCS IP address and then using/etc/hosts
to route all GCS connections to that IP address:The
iptables
rule allows for the initial syn/ack to get the connection established, but then all subsequent inbound packets are dropped before reaching the application layer. We’d expect to see the first socket read during the TLS handshake to timeout after 20 seconds. Instead, the read hangs indefinitely.First, here is the test using Conscrypt, and running
jstack
on the process after it hangs:Second, here is the test using the default security provider, resulting in an exception after the 20-second read timeout:
It’s interesting to note that both processes are performing TLS handshake, but they are doing the handshake from different points of execution in
AbstractDelegateHttpsURLConnection#connect
. With Conscrypt, we are on line 189, invokingHttpURLConnection#plainConnect
. With the default security provider, we are past that and instead down to line 197, callingHttpsClient#afterConnect
.Execution with Conscrypt proceeds into
HttpURLConnection#plainConnect0
line 1162. However, note here that the read timeout is not applied until line 1163, after this completes!Thus, it appears that Conscrypt, unlike the default security provider, is performing the TLS handshake at a time when the read timeout has not yet been applied. It seems like the
sun.net.www
classes have a hidden assumption that TLS handshake will not be performed untilHttpsClient#afterConnect
. Conscrypt, instead, triggers the TLS handshake earlier inConscryptEngineSocket#getOutputStream
. The default provider’sSSLSocketImpl#getOutputStream
does not trigger TLS handshake by side effect like this.To summarize:
HttpClient
seems to assume that it can callgetOutputStream()
before completing TLS handshake.getOutputStream()
starts the TLS handshake by side effect.HttpURLConnection
does not apply the socket read timeout until after this point of execution.It doesn’t appear that there is any Java upgrade path that would resolve this issue. Reviewing
git diff jdk8u332-ga..master
, I don’t see any recent changes insun.net.www
that would change this behavior. Similarly, more recent JVMs retain the same behavior, e.g. Java 17:https://github.com/openjdk/jdk17u/blob/c1e17197222932a03a04d3b8d9c0d7f94be07947/src/java.base/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1242