question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier

See original GitHub issue

Spark submit fails immediately with hudi-spark3.2-bundle_2.12:0.11.0 and kerberos authentication

executing following on our environment will result in the above mentioned error


/usr/bin/spark3-submit --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog" --num-executors 4 --principal vdp@BDA2.VDAB.BE --keytab vdp2.keytab test_hudi_schema_evolution.py

code in python script:

import pyspark

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType

spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
    .getOrCreate()

Maybe we need something extra and this is related to kerberos authentication. In the logs however we can see that we correctly get authenticated.

To Reproduce

Not sure how easy it is to reproduce this - we also apply kerberos authentication through keytab file as you can see in the spark3-submit command but basically we don’t move forward from the basic session getOrCreate.

Expected behavior

No exceptions.

Environment Description

  • Hudi version : 0.11.0

  • Spark version : 3.2

  • Hive version : 3.1.3000

  • Hadoop version : 3.1.1.7

  • Storage (HDFS/S3/GCS…) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

running kerberos authentication with keytab file

Stacktrace

Exception thrown:

Traceback (most recent call last):
  File "/home/dbrys1/test_hudi_schema_evolution.py", line 22, in <module>
    spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/sql/session.py", line 228, in getOrCreate
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 392, in getOrCreate
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 147, in __init__
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1574, in __call__
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier
        at org.apache.hudi.org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier.readFields(AuthenticationTokenIdentifier.java:142)
        at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:192)
        at org.apache.hadoop.security.token.Token.identifierToString(Token.java:444)
        at org.apache.hadoop.security.token.Token.toString(Token.java:464)
        at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.$anonfun$obtainDelegationTokens$2(HBaseDelegationTokenProvider.scala:52)
        at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
        at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
        at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.logInfo(HBaseDelegationTokenProvider.scala:34)
        at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokens(HBaseDelegationTokenProvider.scala:52)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$2(HadoopDelegationTokenManager.scala:164)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:213)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$obtainDelegationTokens(HadoopDelegationTokenManager.scala:162)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:226)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:224)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainTokensAndScheduleRenewal(HadoopDelegationTokenManager.scala:224)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$updateTokensTask(HadoopDelegationTokenManager.scala:198)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.start(HadoopDelegationTokenManager.scala:123)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1(CoarseGrainedSchedulerBackend.scala:552)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1$adapted(CoarseGrainedSchedulerBackend.scala:549)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:549)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:48)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 47 more

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
m2javacommented, Jun 14, 2022

@yihua thank you! Build from master branch worked for me.

1reaction
yihuacommented, Jun 5, 2022

@m2java @xushiyan AuthenticationProtos is from hbase-protocol, which is not included in the bundle and shading process. That is the root cause of the ClassNotFoundException.

I created the fix #5750 to address this issue. @m2java let us know if that works for you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

org.apache.hadoop.hbase.protobuf.generated.MasterProtos ...
I have a problem Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/ ...
Read more >
ClassNotFoundException for Hbase - Cloudera Community
Hi, I have created the code as below to write a record into hbase table of 1.1.2 ... NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration.
Read more >
[#HUDI-4971] aws bundle causes class loading issue - Apache
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.objenesis.strategy.InstantiatorStrategy at java.net.URLClassLoader.
Read more >
ProtobufUtil (Apache HBase - Client 2.2.2 API) - javadoc.io
static org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos. ... Convert a stringified protocol buffer exception Parameter to a Java Exception.
Read more >
org.apache.hadoop.hbase.shaded.protobuf.generated ...
Here you can download the dependencies for the java class org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos. Use this engine to looking through ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found