Cannot initialize SSL - "Cannot open stream" exception by EsHadoop connector for spark after upgrading to version 2.2
See original GitHub issueIssue description
After upgrading local version of spark to 2.2, EsHadoop connector fails to connector to remote elastic cluster with the following error:
Steps to reproduce
Any read/write operation fails, for example(replace the path/host prior to executing this):
val conf = new SparkConf().set("spark.master", "local[2]")
val sparkSession = SparkSession.builder().config(conf).getOrCreate()
import spark.implicits._
val df: DataFrame = Seq("somevalue").toDF("mycol")
val opts = Map(
"es.index.auto.create" -> "true",
"es.read.metadata" -> "true",
"spark.serializer" -> "org.apache.spark.serializer.KryoSerializer",
"es.net.http.auth.user" -> "admin",
"es.net.http.auth.pass" -> "somepass",
"es.net.ssl" -> "true",
"es.net.ssl.cert.allow.self.signed" -> "true",
"es.net.ssl.truststore.location" -> "//some/path",
"es.net.ssl.truststore.pass" -> "ibicloud",
"es.nodes" -> "somehost",
"es.port" -> "9997"
)
df.saveToEs("someindex",opts)
Strack trace:
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:283)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:572)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:168)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.getSSLContext(SSLSocketFactory.java:153)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSocket(SSLSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:478)
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:112)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:635)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
... 10 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:180)
at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:185)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadKeyStore(SSLSocketFactory.java:178)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadTrustManagers(SSLSocketFactory.java:204)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:166)
... 25 more
17/07/19 19:52:16 WARN TaskSetManager: Lost task 0.0 in stage 13.0 (TID 815, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:283)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:572)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:168)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.getSSLContext(SSLSocketFactory.java:153)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSocket(SSLSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:478)
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:112)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:635)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
... 10 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:180)
at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:185)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadKeyStore(SSLSocketFactory.java:178)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadTrustManagers(SSLSocketFactory.java:204)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:166)
... 25 more
Version Info
I’ve first experienced this with ES-Hadoop version 5.4.1. Tried to upgrade it to all version until latest (6.0.0-alpha2) and it didn’t resolve the issue.
OS: : Windows JVM : 1.8 Hadoop/Spark: 2.2 ES-Hadoop : 5.4.1 - 6.0.0-alpha2 ES : 5.4.0
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
Error while installing Spark on Google Colab - Stack Overflow
1-bin-hadoop2.7.tgz: Cannot open: No such file or directory tar: Error is not recoverable: exiting now. These were my steps.
Read more >Configuration - Spark 2.2.2 Documentation - Apache Spark
SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through...
Read more >Common SSL/TLS exceptions | Elasticsearch Guide [8.5]
This error occurs when a SSL/TLS cipher suite is specified that cannot supported by the JVM that Elasticsearch is running in. Security tries...
Read more >Using org.elasticsearch.hadoop with searchguard
I am trying to connect to ES from spark. It worked fine until Searchguard was installed. As per the documentation Configuration | Elasticsearch ......
Read more >Spark can't connect to secure phoenix - Cloudera Community
ServiceException : java.io.IOException: Could not set up IO Streams to demo-qa2-nn/10.60.2.15:16000 at org.apache.hadoop.hbase.client.RpcRetryingCaller.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for checking it. This runs locally, I have access to the truststore path which does exist (when spark runs with version 2.1 it enables to access it). I’ll try again tomorrow when we have more information about the cause.
So, the problem is actually not with the connector itself (although it can provide a workaround) but in the core spark libraries. Since IOUtils.open() does: new URL(resource).openStream()
It relies on the URL stream handler factory, which is being set by Spark to org.apache.hadoop.fs.FsUrlStreamHandlerFactory inside SharedState.scala
It is possible that the behavior of the hadoop RawLocalFileSystem changed between 2.1 and 2.2.0, but either way - UNC paths for loading the truststore (such as in the original post) no longer work, and they did in 2.1. The connector could provide a workaround by recognize the UNC path and loading the file directly. So for a path like: “\\shared\folder\truststore.jks” something along these lines would work in IOUtils.open():