SSL request on Databricks fails with handshake_failure
See original GitHub issueProblem Description
I’m sending metrics from a Spark job on Azure Databricks to Datadog. The library I’m using (kamon-datadog @ https://github.com/kamon-io/Kamon/tree/v2.1.1) uses OkHttp3 (version 3.14.7, but problem also occurs on 4.7.2) for communication with the Datadog API at https://api.datadoghq.com/
Running the Spark job locally works without any problems. Running it on Databricks fails with a handshake_failure
error. After quite extensive debugging I was able to pinpoint the problem to the combination of OkHttp3 and Databricks.
How to reproduce
I can reliably reproduce the issue by starting a new notebook in Azure Databricks on a cluster (I tested Databricks 6.5 (Spark 2.4.5 / Scala 2.11 / Java 1.8.0_242) and 7.0 (Spark 3.0 / Scala 2.12 / Java 1.11.655)) with the OkHttp3 4.7.2 library installed (or any older version I tried), and running the following code:
import okhttp3.{OkHttpClient, Request}
val cl: OkHttpClient = new OkHttpClient()
val req = new Request.Builder()
.url("https://api.datadoghq.com/account/login?next=%2F")
.method("GET", null)
.build()
cl.newCall(req).execute().toString
Stacktrace
This is the full stacktrace I receive (on Java 8):
javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.Alerts.getSSLException(Alerts.java:154)
at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2020)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1127)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379)
at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.kt:367)
at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.kt:325)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:197)
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:249)
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:108)
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:76)
at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:245)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:96)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:197)
at okhttp3.internal.connection.RealCall.execute(RealCall.kt:148)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2919730791141205:21)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2919730791141205:71)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw$$iw.<init>(command-2919730791141205:73)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw.<init>(command-2919730791141205:75)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw.<init>(command-2919730791141205:77)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw.<init>(command-2919730791141205:79)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read.<init>(command-2919730791141205:81)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$.<init>(command-2919730791141205:85)
at line4f279a02c14c4d08aec3bb073b25d6e927.$read$.<clinit>(command-2919730791141205)
at line4f279a02c14c4d08aec3bb073b25d6e927.$eval$.$print$lzycompute(<notebook>:7)
at line4f279a02c14c4d08aec3bb073b25d6e927.$eval$.$print(<notebook>:6)
at line4f279a02c14c4d08aec3bb073b25d6e927.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:396)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:373)
at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at scala.util.Try$.apply(Try.scala:192)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
Workaround
I found a similar issue on Stackoverflow: https://stackoverflow.com/questions/57297159/databricks-job-getting-javax-net-ssl-sslhandshakeexception-received-fatal-alert
Based on the “solution” I tried swapping out the request to Unirest which works without any issues, both locally as on Databricks.
In conclusion
It seems that something in the combination of Databricks and OkHttp3 is making specific SSL connections fail. Datadog happens to fail, but several other sites I tried work just fine using OkHttp on Databricks.
If there’s any other info I can provide, let me know.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (3 by maintainers)
Top GitHub Comments
Since I found this issue via Google when working on a similar problem with Databricks, I figured I’d share some explanation for those in this thread.
Databricks sets
java.security.properties=/databricks/spark/dbconf/java/extra.security
. The file contains a single line:The default values for
jdk.tls.disabledAlgorithms
in Java 8 (according to https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html) is:For OkHttp users on databricks with ssl handshake issues potentially related to cipher support, you may need to change that file. (In my case, disallowing GCM was the issue…and it seems they may have done that due to performance issues with GCM in Java 8 at the time.) See my solution for using an init script for the Databricks cluster: https://github.com/neo4j-contrib/neo4j-spark-connector/issues/300#issuecomment-788300727
Thanks for the update and explanation.
Reading that last post it seems that two opinionated frameworks are incompatible by default
This isn’t really a bug because you can always opt OkHttp back into some accepted ciphers, but it explains why by default OkHttp fails but Unirest works. And glad the performance issue with GCM seems fixed in 8+ now
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177784 https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8201633