question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spline integration with Spark 2.3 throw java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'

See original GitHub issue

Issue Description

Hi, I build spline for spark2.3, Scala-2.11, and running with py2 or py3, but exception throw like this: spline: v0.7.11-SNAPSHOT

session key: spark.app.name
session value: query embedding extractor sanyu
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1069)
	 at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
	 at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
	 at scala.Option.getOrElse(Option.scala:121)
	 at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
	 at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
	 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	 at java.lang.reflect.Method.invoke(Method.java:498)
	 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	 at py4j.Gateway.invoke(Gateway.java:282)
	 at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	 at py4j.commands.CallCommand.execute(CallCommand.java:79)
	 at py4j.GatewayConnection.run(GatewayConnection.java:238)
	 at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
  File "/mnt/sdd1/yarn/local/usercache/app/appcache/application_1656468332243_119007/container_e10_1656468332243_119007_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/mnt/sdd1/yarn/local/usercache/app/appcache/application_1656468332243_119007/container_e10_1656468332243_119007_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o92.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
	at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1069)
	at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
	at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
	at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Session is unexpectedly missing. Spline cannot be initialized.
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$2.apply(SplineQueryExecutionListener.scala:35)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$2.apply(SplineQueryExecutionListener.scala:35)
	at scala.Option.getOrElse(Option.scala:121)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.<init>(SplineQueryExecutionListener.scala:35)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2826)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2815)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2815)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$$lessinit$greater$1.apply(QueryExecutionListener.scala:83)
	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$$lessinit$greater$1.apply(QueryExecutionListener.scala:82)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.sql.util.ExecutionListenerManager.<init>(QueryExecutionListener.scala:82)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$listenerManager$2.apply(BaseSessionStateBuilder.scala:270)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$listenerManager$2.apply(BaseSessionStateBuilder.scala:270)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.listenerManager(BaseSessionStateBuilder.scala:269)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:297)
	at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1066)
	... 16 more


the submit command is like this:

/spark2/bin/spark-submit --conf spark.pyspark.python=/data/venv/hdp-envpy3/bin/python --conf spark.pyspark.driver.python=/data/venv/hdp-envpy3/bin/python  --num-executors 2 --executor-memory 1G --driver-memory 1G --name test_lineage  --conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" --conf spark.spline.mode=BEST_EFFORT  --conf spark.spline.lineageDispatcher=composite --conf spark.spline.lineageDispatcher.composite.dispatchers=logging,http --conf spark.spline.lineageDispatcher.http.producer.url=http://172.18.221.156:8080/producer    --deploy-mode cluster tablelineage1.py 

It seems in SparkSession getOrCreate it triggers the QueryExecutionListener init, but the SplineQueryExecutionListener need sparksession to be active already, is it a deadlock for spark2.3?

Issue Analytics

  • State:closed
  • Created 5 months ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
wineternitycommented, Aug 27, 2022

I found the reason, it is related to https://issues.apache.org/jira/browse/SPARK-23228, so spline codeless mode not worked with pyspark2 < 2.4.0

0reactions
cerveadacommented, Aug 29, 2022

Thanks for letting us know.

Read more comments on GitHub >

github_iconTop Results From Across the Web

java.lang.IllegalArgumentException: Error while instantiating ...
I got stuck with 'org.apache.spark.sql.hive.HiveSessionState' error while I was trying to read a csv file using spark session.
Read more >
Spark2-shell throws exception lang.IllegalArgumentException
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder': at org.apache.spark.sql.
Read more >
Spark Error while instantiating org apache spark sql hive ...
I'm running spark-shell in my VM. but it"s throwing below error. looks like something related to hive setting. Can u please help? java.lang....
Read more >
"Error while instantiating 'org.apache.spark.sql.hive ...
The most common error in Spark session user face is related to hive session. Some error says hive can not load some says...
Read more >
Spark SQL, DataFrames and Datasets Guide
A DataFrame is a Dataset organized into named columns. It is conceptually equivalent to a table in a relational database or a data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found