Spline integration with Spark 2.3 throw java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'
See original GitHub issueHi, I build spline for spark2.3, Scala-2.11, and running with py2 or py3, but exception throw like this: spline: v0.7.11-SNAPSHOT
session key: spark.app.name
session value: query embedding extractor sanyu
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1069)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
File "/mnt/sdd1/yarn/local/usercache/app/appcache/application_1656468332243_119007/container_e10_1656468332243_119007_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/sdd1/yarn/local/usercache/app/appcache/application_1656468332243_119007/container_e10_1656468332243_119007_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o92.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1069)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Session is unexpectedly missing. Spline cannot be initialized.
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$2.apply(SplineQueryExecutionListener.scala:35)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$2.apply(SplineQueryExecutionListener.scala:35)
at scala.Option.getOrElse(Option.scala:121)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.<init>(SplineQueryExecutionListener.scala:35)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2826)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2815)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2815)
at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$$lessinit$greater$1.apply(QueryExecutionListener.scala:83)
at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$$lessinit$greater$1.apply(QueryExecutionListener.scala:82)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.sql.util.ExecutionListenerManager.<init>(QueryExecutionListener.scala:82)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$listenerManager$2.apply(BaseSessionStateBuilder.scala:270)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$listenerManager$2.apply(BaseSessionStateBuilder.scala:270)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.listenerManager(BaseSessionStateBuilder.scala:269)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:297)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1066)
... 16 more
the submit command is like this:
/spark2/bin/spark-submit --conf spark.pyspark.python=/data/venv/hdp-envpy3/bin/python --conf spark.pyspark.driver.python=/data/venv/hdp-envpy3/bin/python --num-executors 2 --executor-memory 1G --driver-memory 1G --name test_lineage --conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" --conf spark.spline.mode=BEST_EFFORT --conf spark.spline.lineageDispatcher=composite --conf spark.spline.lineageDispatcher.composite.dispatchers=logging,http --conf spark.spline.lineageDispatcher.http.producer.url=http://172.18.221.156:8080/producer --deploy-mode cluster tablelineage1.py
It seems in SparkSession getOrCreate it triggers the QueryExecutionListener init, but the SplineQueryExecutionListener need sparksession to be active already, is it a deadlock for spark2.3?
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
java.lang.IllegalArgumentException: Error while instantiating ...
I got stuck with 'org.apache.spark.sql.hive.HiveSessionState' error while I was trying to read a csv file using spark session.
Read more >Spark2-shell throws exception lang.IllegalArgumentException
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder': at org.apache.spark.sql.
Read more >Spark Error while instantiating org apache spark sql hive ...
I'm running spark-shell in my VM. but it"s throwing below error. looks like something related to hive setting. Can u please help? java.lang....
Read more >"Error while instantiating 'org.apache.spark.sql.hive ...
The most common error in Spark session user face is related to hive session. Some error says hive can not load some says...
Read more >Spark SQL, DataFrames and Datasets Guide
A DataFrame is a Dataset organized into named columns. It is conceptually equivalent to a table in a relational database or a data...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I found the reason, it is related to https://issues.apache.org/jira/browse/SPARK-23228, so spline codeless mode not worked with pyspark2 < 2.4.0
Thanks for letting us know.