[SUPPORT] java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier
See original GitHub issueSpark submit fails immediately with hudi-spark3.2-bundle_2.12:0.11.0 and kerberos authentication
executing following on our environment will result in the above mentioned error
/usr/bin/spark3-submit --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog" --num-executors 4 --principal vdp@BDA2.VDAB.BE --keytab vdp2.keytab test_hudi_schema_evolution.py
code in python script:
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType
spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
.getOrCreate()
Maybe we need something extra and this is related to kerberos authentication. In the logs however we can see that we correctly get authenticated.
To Reproduce
Not sure how easy it is to reproduce this - we also apply kerberos authentication through keytab file as you can see in the spark3-submit command but basically we don’t move forward from the basic session getOrCreate.
Expected behavior
No exceptions.
Environment Description
-
Hudi version : 0.11.0
-
Spark version : 3.2
-
Hive version : 3.1.3000
-
Hadoop version : 3.1.1.7
-
Storage (HDFS/S3/GCS…) : HDFS
-
Running on Docker? (yes/no) : no
Additional context
running kerberos authentication with keytab file
Stacktrace
Exception thrown:
Traceback (most recent call last):
File "/home/dbrys1/test_hudi_schema_evolution.py", line 22, in <module>
spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/sql/session.py", line 228, in getOrCreate
File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 392, in getOrCreate
File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 147, in __init__
File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1574, in __call__
File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier
at org.apache.hudi.org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier.readFields(AuthenticationTokenIdentifier.java:142)
at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:192)
at org.apache.hadoop.security.token.Token.identifierToString(Token.java:444)
at org.apache.hadoop.security.token.Token.toString(Token.java:464)
at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.$anonfun$obtainDelegationTokens$2(HBaseDelegationTokenProvider.scala:52)
at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.logInfo(HBaseDelegationTokenProvider.scala:34)
at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokens(HBaseDelegationTokenProvider.scala:52)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$2(HadoopDelegationTokenManager.scala:164)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:213)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$obtainDelegationTokens(HadoopDelegationTokenManager.scala:162)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:226)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:224)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainTokensAndScheduleRenewal(HadoopDelegationTokenManager.scala:224)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$updateTokensTask(HadoopDelegationTokenManager.scala:198)
at org.apache.spark.deploy.security.HadoopDelegationTokenManager.start(HadoopDelegationTokenManager.scala:123)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1(CoarseGrainedSchedulerBackend.scala:552)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1$adapted(CoarseGrainedSchedulerBackend.scala:549)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:549)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:48)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 47 more
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
@yihua thank you! Build from master branch worked for me.
@m2java @xushiyan
AuthenticationProtos
is fromhbase-protocol
, which is not included in the bundle and shading process. That is the root cause of theClassNotFoundException
.I created the fix #5750 to address this issue. @m2java let us know if that works for you.