[SUPPORT] Metadata table thows hbase exceptions
See original GitHub issueDescription
We’re running on a cloudera cdp stack and want to upgrade to hudi 0.11.1 and take advantage of the metadata table feature. We tried to run a simple hudi write with generated data an got the attached stacktrace.
We have used this hudi package: org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1.
The exception indicates that maybe something is not compatibe with the hbase version which hudi is compiled against. Unfortunately Cloudera provides hbase in verison 2.2.3. A far as we understood, hbase is only used if index type is set to hbase, so we’re not sure why hudi need the hbase class here.
If we set hoodie.metadata.enable to false it’s working, but we want to take advantage of this feature.
We tried 2 things to get rid of this exception.
- Set Index type to BLOOM -> no effect
- Especially add the hbase server and client jar to the spark shell in the version hudi is compiled against -> no effect
Environment Description
-
Hudi version : 0.11.1
-
Spark version : 3.1.1
-
Hive version : 3.1.3
-
Hadoop version : 3.1.1
-
Storage (HDFS/S3/GCS…) : HDFS
-
Running on Docker? (yes/no) : no -> yarn on cloudera cdp 7.1.7
Additional context
. Example write:
df.write.format("hudi")
.option(HIVE_CREATE_MANAGED_TABLE.key(), false)
.option(HIVE_DATABASE.key(), "db_demo")
.option(HIVE_SYNC_ENABLED.key(), true)
.option(HIVE_SYNC_MODE.key(), "HMS")
.option(HIVE_TABLE.key(), "ht_hudi_11_1_metadata")
.option("hoodie.table.name", "ht_hudi_11_1_metadata")
.option(KEYGENERATOR_CLASS_NAME.key(), "org.apache.hudi.keygen.NonpartitionedKeyGenerator")
.option(OPERATION.key(), "upsert")
.option(PRECOMBINE_FIELD.key(), "sequence")
.option(RECORDKEY_FIELD.key(), "id")
.option(TABLE_NAME.key(), "ht_hudi_11_1_metadata")
.option("hoodie.index.type","BLOOM")
.option("hoodie.metadata.enable", true)
.mode("append")
.save("hdfs:///.../hudi_11_1_metadata")
Stacktrace
Caused by: java.lang.ExceptionInInitializerError
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
... 28 more
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (2.2.3.7.1.7.0-551), this version is 2.4.9
at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:74)
at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:84)
at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:98)
at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Context.<init>(Context.java:44)
at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<init>(Encryption.java:110)
at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<clinit>(Encryption.java:107)
... 36 more
........
22/08/12 08:19:20 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 9) (hdl-w05.charite.de executor 1): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
Issue Analytics
- State:
- Created a year ago
- Comments:21 (12 by maintainers)
Top GitHub Comments
Thanks all, finally the issue was that cloudera provides the configuration for the parameter hbase.defaults.for.version.skip. After overwriting the parameter within cloudera manager it works.
Thanks for your response, I already saw this in the pom file. Thats why we started our spark-shell with the hbase jars in this version, but unfortunately with no success. I guess they are the last one on the classpath.
I was wondering why the hbase lib is mandatory for the metadate table. Where is the metadata stored when we’re setting this option?
.option("hoodie.index.type","BLOOM")