question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Indefinite blocks on NativeCrypto.SSL_do_handshake

See original GitHub issue

We have user reports from Google Cloud Dataproc that threads/tasks would intermittently hang on NativeCrypto.SSL_do_handshake. There were some Java and native thread dumps shared in GoogleCloudDataproc/hadoop-connectors/issues/153, but we were unable to produce a solid reproduction so far.

Dataproc currently uses Conscrypt 1.4.2 on java-8-openjdk-amd64, and on the OS side we have heard from both Debian 9 (Debian-provided OpenJDK) and Debian 10 (AdoptOpenJDK). One user shared with GCP Support that they still saw the same hang even after manually updating to Conscrypt 2.4.0.

Below is an example of the thread dump.

"gcsfs-batch-helper-3912" #4130 daemon prio=5 os_prio=0 tid=0x00007f92a4040800 nid=0x3130 runnable [0x00007f9330815000]
   java.lang.Thread.State: RUNNABLE
	at org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
	at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:392)
	at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225)
	at org.conscrypt.ConscryptFileDescriptorSocket.waitForHandshake(ConscryptFileDescriptorSocket.java:474)
	at org.conscrypt.ConscryptFileDescriptorSocket.getOutputStream(ConscryptFileDescriptorSocket.java:461)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	- locked <0x0000000651a1d8a8> (a sun.net.www.protocol.https.HttpsClient)
	at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
	at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.execute(BatchHelper.java:175)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.lambda$queue$0(BatchHelper.java:163)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper$$Lambda$96/631118845.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
samwiisecommented, Jan 22, 2021

Issue is still there in Conscrypt 2.5.1

“pool-7-thread-4” #33 prio=5 os_prio=0 tid=0x00007f3b383f3800 nid=0x35fe runnable [0x00007f3b3ca4e000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(ConscryptEngineSocket.java:920) at org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:884) at org.conscrypt.ConscryptEngineSocket$SSLInputStream.access$100(ConscryptEngineSocket.java:706) at org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:230) at org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:209) - locked <0x00000000bc87ad60> (a java.lang.Object) at org.conscrypt.ConscryptEngineSocket.waitForHandshake(ConscryptEngineSocket.java:547) at org.conscrypt.ConscryptEngineSocket.getOutputStream(ConscryptEngineSocket.java:290) at sun.net.www.http.HttpClient.openServer(HttpClient.java:465) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) - locked <0x00000000bc87ad70> (a sun.net.www.protocol.https.HttpsClient) at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:167) at net.elastica.discoveryutil.ServiceDiscovery.getHttpsConnection(ServiceDiscovery.java:169) at net.elastica.discoveryutil.ServiceDiscovery.getServiceFromIP(ServiceDiscovery.java:183) at net.elastica.audit.ServiceIpToServiceIdResolverJob$ServiceMap$1.run(ServiceIpToServiceIdResolverJob.java:297) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

0reactions
cnaurothcommented, Aug 19, 2022

I think I’ve found the root cause. It requires a careful reading of the sun.net.www JDK classes and how they interact with Conscrypt classes. For reference, here is the Java version used in my tests:

openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Temurin)(build 25.332-b09, mixed mode)

All of the following stack traces and code walkthrough will refer to OpenJDK 8u322 and Conscrypt 2.5.2.

Following are stack traces from running a simple test harness that uses HttpsURLConnection with a 20-second read timeout. To simulate a network fault, I’m applying an iptables rule on connections to a GCS IP address and then using /etc/hosts to route all GCS connections to that IP address:

IP=$(nslookup storage.googleapis.com | grep -E 'Address: [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -1 | awk '{ print $NF }')

sudo iptables -A OUTPUT -p tcp --source 10.240.1.19 --destination "${IP}" -m state --state ESTABLISHED -j DROP

cnauroth@cnauroth-conscrypt-on-m:~$ cat /etc/hosts
127.0.0.1	localhost
::1		localhost ip6-localhost ip6-loopback
ff02::1		ip6-allnodes
ff02::2		ip6-allrouters

108.177.111.128 storage.googleapis.com

10.240.0.199 cnauroth-conscrypt-on-m.us-central1-c.c.hadoop-cloud-dev.google.com.internal cnauroth-conscrypt-on-m  # Added by Google
169.254.169.254 metadata.google.internal  # Added by Google

The iptables rule allows for the initial syn/ack to get the connection established, but then all subsequent inbound packets are dropped before reaching the application layer. We’d expect to see the first socket read during the TLS handshake to timeout after 20 seconds. Instead, the read hangs indefinitely.

First, here is the test using Conscrypt, and running jstack on the process after it hangs:

"main" #1 prio=5 os_prio=0 tid=0x00007fc3f800a800 nid=0x54ac runnable [0x00007fc3ff14a000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(ConscryptEngineSocket.java:920)
	at org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:884)
	at org.conscrypt.ConscryptEngineSocket$SSLInputStream.access$100(ConscryptEngineSocket.java:706)
	at org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:230)
	at org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:209)
	- locked <0x0000000772a610a0> (a java.lang.Object)
	at org.conscrypt.ConscryptEngineSocket.waitForHandshake(ConscryptEngineSocket.java:547)
	at org.conscrypt.ConscryptEngineSocket.getOutputStream(ConscryptEngineSocket.java:290)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	- locked <0x0000000772a3da00> (a sun.net.www.protocol.https.HttpsClient)
	at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
	at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1572)
	- locked <0x0000000772a39df8> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
	- locked <0x0000000772a39df8> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
	- locked <0x0000000771eadb40> (a sun.net.www.protocol.https.HttpsURLConnectionImpl)
	at TestHttpsURLConnection.main(TestHttpsURLConnection.java:17)

Second, here is the test using the default security provider, resulting in an exception after the 20-second read timeout:

Exception in thread "main" javax.net.ssl.SSLException: Read timed out
	at sun.security.ssl.Alert.createSSLException(Alert.java:127)
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:138)
	at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1397)
	at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1305)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:440)
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:197)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1572)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
	at TestHttpsURLConnection.main(TestHttpsURLConnection.java:18)
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
	at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:165)
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:109)
	... 9 more

It’s interesting to note that both processes are performing TLS handshake, but they are doing the handshake from different points of execution in AbstractDelegateHttpsURLConnection#connect. With Conscrypt, we are on line 189, invoking HttpURLConnection#plainConnect. With the default security provider, we are past that and instead down to line 197, calling HttpsClient#afterConnect.

Execution with Conscrypt proceeds into HttpURLConnection#plainConnect0 line 1162. However, note here that the read timeout is not applied until line 1163, after this completes!

                                http = getNewHttpClient(url, p, connectTimeout);
                                http.setReadTimeout(readTimeout);

Thus, it appears that Conscrypt, unlike the default security provider, is performing the TLS handshake at a time when the read timeout has not yet been applied. It seems like the sun.net.www classes have a hidden assumption that TLS handshake will not be performed until HttpsClient#afterConnect. Conscrypt, instead, triggers the TLS handshake earlier in ConscryptEngineSocket#getOutputStream. The default provider’s SSLSocketImpl#getOutputStream does not trigger TLS handshake by side effect like this.

To summarize:

  • The Sun HttpClient seems to assume that it can call getOutputStream() before completing TLS handshake.
  • This assumption is safe with the default security provider.
  • With Conscrypt, the call to getOutputStream() starts the TLS handshake by side effect.
  • HttpURLConnection does not apply the socket read timeout until after this point of execution.
  • Therefore, Conscrypt is performing TLS handshake without the timeout enforced.

It doesn’t appear that there is any Java upgrade path that would resolve this issue. Reviewing git diff jdk8u332-ga..master, I don’t see any recent changes in sun.net.www that would change this behavior. Similarly, more recent JVMs retain the same behavior, e.g. Java 17:

https://github.com/openjdk/jdk17u/blob/c1e17197222932a03a04d3b8d9c0d7f94be07947/src/java.base/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1242

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wikipedia:Blocking policy
Blocks may be applied to user accounts, to IP addresses, and to ranges of IP addresses, for either a definite or an indefinite...
Read more >
Soulcalibur Wiki:Blocking - Fandom
Indefinite blocks. Repeated or especially severe rules violations, or the potential of a compromised account, may result in an indefinite block against a...
Read more >
Nookipedia:Block policy - Animal Crossing Wiki
Blocks may be applied to user accounts, to IP addresses, and to ranges of IP addresses, and can be either temporary or indefinite....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found