question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SSL request on Databricks fails with handshake_failure

See original GitHub issue

Problem Description

I’m sending metrics from a Spark job on Azure Databricks to Datadog. The library I’m using (kamon-datadog @ https://github.com/kamon-io/Kamon/tree/v2.1.1) uses OkHttp3 (version 3.14.7, but problem also occurs on 4.7.2) for communication with the Datadog API at https://api.datadoghq.com/

Running the Spark job locally works without any problems. Running it on Databricks fails with a handshake_failure error. After quite extensive debugging I was able to pinpoint the problem to the combination of OkHttp3 and Databricks.

How to reproduce

I can reliably reproduce the issue by starting a new notebook in Azure Databricks on a cluster (I tested Databricks 6.5 (Spark 2.4.5 / Scala 2.11 / Java 1.8.0_242) and 7.0 (Spark 3.0 / Scala 2.12 / Java 1.11.655)) with the OkHttp3 4.7.2 library installed (or any older version I tried), and running the following code:

import okhttp3.{OkHttpClient, Request}

val cl: OkHttpClient = new OkHttpClient()
val req = new Request.Builder()
  .url("https://api.datadoghq.com/account/login?next=%2F")
  .method("GET", null)
  .build()
cl.newCall(req).execute().toString

Stacktrace

This is the full stacktrace I receive (on Java 8):

javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:154)
	at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2020)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1127)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379)
	at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.kt:367)
	at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.kt:325)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:197)
	at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:249)
	at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:108)
	at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:76)
	at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:245)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:96)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:100)
	at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:197)
	at okhttp3.internal.connection.RealCall.execute(RealCall.kt:148)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2919730791141205:21)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2919730791141205:71)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw$$iw.<init>(command-2919730791141205:73)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw$$iw.<init>(command-2919730791141205:75)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw$$iw.<init>(command-2919730791141205:77)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$$iw.<init>(command-2919730791141205:79)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read.<init>(command-2919730791141205:81)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$.<init>(command-2919730791141205:85)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$read$.<clinit>(command-2919730791141205)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$eval$.$print$lzycompute(<notebook>:7)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$eval$.$print(<notebook>:6)
	at line4f279a02c14c4d08aec3bb073b25d6e927.$eval.$print(<notebook>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
	at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
	at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
	at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
	at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
	at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
	at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
	at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
	at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
	at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:396)
	at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:373)
	at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
	at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)
	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
	at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
	at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
	at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
	at scala.util.Try$.apply(Try.scala:192)
	at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
	at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
	at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
	at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
	at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
	at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
	at java.lang.Thread.run(Thread.java:748)

Workaround

I found a similar issue on Stackoverflow: https://stackoverflow.com/questions/57297159/databricks-job-getting-javax-net-ssl-sslhandshakeexception-received-fatal-alert

Based on the “solution” I tried swapping out the request to Unirest which works without any issues, both locally as on Databricks.

In conclusion

It seems that something in the combination of Databricks and OkHttp3 is making specific SSL connections fail. Datadog happens to fail, but several other sites I tried work just fine using OkHttp on Databricks.

If there’s any other info I can provide, let me know.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
voutiladcommented, Mar 2, 2021

Since I found this issue via Google when working on a similar problem with Databricks, I figured I’d share some explanation for those in this thread.

Databricks setsjava.security.properties=/databricks/spark/dbconf/java/extra.security. The file contains a single line:

jdk.tls.disabledAlgorithms=SSLv3, RC4, DES, MD5withRSA, DH keySize < 1024, EC keySize < 224, 3DES_EDE_CBC, anon, NULL, GCM

The default values for jdk.tls.disabledAlgorithms in Java 8 (according to https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html) is:

jdk.tls.disabledAlgorithms=SSLv3, RC4, MD5withRSA, DH keySize < 1024, EC keySize < 224

For OkHttp users on databricks with ssl handshake issues potentially related to cipher support, you may need to change that file. (In my case, disallowing GCM was the issue…and it seems they may have done that due to performance issues with GCM in Java 8 at the time.) See my solution for using an init script for the Databricks cluster: https://github.com/neo4j-contrib/neo4j-spark-connector/issues/300#issuecomment-788300727

0reactions
yschimkecommented, Mar 2, 2021

Thanks for the update and explanation.

Reading that last post it seems that two opinionated frameworks are incompatible by default

  • DataBricks - choosing only efficient cipher suites
  • OkHttp - defaulting to only modern secure cipher suites

This isn’t really a bug because you can always opt OkHttp back into some accepted ciphers, but it explains why by default OkHttp fails but Unirest works. And glad the performance issue with GCM seems fixed in 8+ now

https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177784 https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8201633

Read more comments on GitHub >

github_iconTop Results From Across the Web

Databricks job getting javax.net.ssl.SSLHandshakeException ...
I couldn't ping point the cause of the problem but I found a workaround which is not to use OkHttp, replacing the code...
Read more >
Handshake fails trying to connect from Azure Databricks to ...
Handshake fails trying to connect from Azure Databricks to Azure ... PSQLException: SSL error: Received fatal alert: handshake_failure.
Read more >
Kafka - SSL handshake failed - Databricks Community
I am receiving SSL handshake error even though the trust-store I have created is based on server certificate and the fingerprint in the...
Read more >
IPA Server Upgrade Failing with error - Red Hat Customer Portal
Request ID '20181216144921': status: CA_UNREACHABLE ca-error: Error 58 ... routines:ssl23_write:ssl handshake failure:s23_lib.c:177:.
Read more >
400 Bad Request - SSL Certificate Error | Apigee Edge
The client application receives an HTTP 400 - Bad request response with the message "The SSL certificate error". This error is typically sent...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found