question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot initialize SSL - "Cannot open stream" exception by EsHadoop connector for spark after upgrading to version 2.2

See original GitHub issue

Issue description

After upgrading local version of spark to 2.2, EsHadoop connector fails to connector to remote elastic cluster with the following error:

Steps to reproduce

Any read/write operation fails, for example(replace the path/host prior to executing this):

    val conf = new SparkConf().set("spark.master", "local[2]")
    val sparkSession = SparkSession.builder().config(conf).getOrCreate()
    import spark.implicits._
    val df: DataFrame = Seq("somevalue").toDF("mycol")
    val opts = Map(
      "es.index.auto.create" -> "true",
      "es.read.metadata" -> "true",
      "spark.serializer" -> "org.apache.spark.serializer.KryoSerializer",
      "es.net.http.auth.user" -> "admin",
      "es.net.http.auth.pass" -> "somepass",
      "es.net.ssl" -> "true",
      "es.net.ssl.cert.allow.self.signed" -> "true",
      "es.net.ssl.truststore.location" -> "//some/path",
      "es.net.ssl.truststore.pass" -> "ibicloud",
      "es.nodes" -> "somehost",
      "es.port" -> "9997"
    )

    df.saveToEs("someindex",opts)

Strack trace:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:283)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:572)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:168)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.getSSLContext(SSLSocketFactory.java:153)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSocket(SSLSocketFactory.java:122)
	at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:478)
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:112)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
	at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:635)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
	... 10 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
	at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:180)
	at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:185)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadKeyStore(SSLSocketFactory.java:178)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadTrustManagers(SSLSocketFactory.java:204)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:166)
	... 25 more
17/07/19 19:52:16 WARN TaskSetManager: Lost task 0.0 in stage 13.0 (TID 815, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:283)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:572)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:96)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:168)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.getSSLContext(SSLSocketFactory.java:153)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSocket(SSLSocketFactory.java:122)
	at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:478)
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:112)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
	at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:635)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
	... 10 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream (inlined queries need to be marked as such through `?` and `{}`) for resource file://
	at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:180)
	at org.elasticsearch.hadoop.util.IOUtils.open(IOUtils.java:185)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadKeyStore(SSLSocketFactory.java:178)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadTrustManagers(SSLSocketFactory.java:204)
	at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:166)
	... 25 more

Version Info

I’ve first experienced this with ES-Hadoop version 5.4.1. Tried to upgrade it to all version until latest (6.0.0-alpha2) and it didn’t resolve the issue.

OS: : Windows JVM : 1.8 Hadoop/Spark: 2.2 ES-Hadoop : 5.4.1 - 6.0.0-alpha2 ES : 5.4.0

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
yosid16commented, Jul 19, 2017

Thanks for checking it. This runs locally, I have access to the truststore path which does exist (when spark runs with version 2.1 it enables to access it). I’ll try again tomorrow when we have more information about the cause.

0reactions
dmarkhascommented, Jan 27, 2018

So, the problem is actually not with the connector itself (although it can provide a workaround) but in the core spark libraries. Since IOUtils.open() does: new URL(resource).openStream()

It relies on the URL stream handler factory, which is being set by Spark to org.apache.hadoop.fs.FsUrlStreamHandlerFactory inside SharedState.scala

It is possible that the behavior of the hadoop RawLocalFileSystem changed between 2.1 and 2.2.0, but either way - UNC paths for loading the truststore (such as in the original post) no longer work, and they did in 2.1. The connector could provide a workaround by recognize the UNC path and loading the file directly. So for a path like: “\\shared\folder\truststore.jks” something along these lines would work in IOUtils.open():

       if (resource.startsWith("\\"))
        {
            return new FileInputStream(new File(resource));
        }
        // no prefix means classpath
        else if (!resource.contains(":")) {
            return loader.getResourceAsStream(resource);
        }
      return new URL(resource).openStream();
Read more comments on GitHub >

github_iconTop Results From Across the Web

Error while installing Spark on Google Colab - Stack Overflow
1-bin-hadoop2.7.tgz: Cannot open: No such file or directory tar: Error is not recoverable: exiting now. These were my steps.
Read more >
Configuration - Spark 2.2.2 Documentation - Apache Spark
SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through...
Read more >
Common SSL/TLS exceptions | Elasticsearch Guide [8.5]
This error occurs when a SSL/TLS cipher suite is specified that cannot supported by the JVM that Elasticsearch is running in. Security tries...
Read more >
Using org.elasticsearch.hadoop with searchguard
I am trying to connect to ES from spark. It worked fine until Searchguard was installed. As per the documentation Configuration | Elasticsearch ......
Read more >
Spark can't connect to secure phoenix - Cloudera Community
ServiceException : java.io.IOException: Could not set up IO Streams to demo-qa2-nn/10.60.2.15:16000 at org.apache.hadoop.hbase.client.RpcRetryingCaller.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found