question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Metadata table thows hbase exceptions

See original GitHub issue

Description

We’re running on a cloudera cdp stack and want to upgrade to hudi 0.11.1 and take advantage of the metadata table feature. We tried to run a simple hudi write with generated data an got the attached stacktrace.

We have used this hudi package: org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1.

The exception indicates that maybe something is not compatibe with the hbase version which hudi is compiled against. Unfortunately Cloudera provides hbase in verison 2.2.3. A far as we understood, hbase is only used if index type is set to hbase, so we’re not sure why hudi need the hbase class here.

If we set hoodie.metadata.enable to false it’s working, but we want to take advantage of this feature.

We tried 2 things to get rid of this exception.

  1. Set Index type to BLOOM -> no effect
  2. Especially add the hbase server and client jar to the spark shell in the version hudi is compiled against -> no effect

Environment Description

  • Hudi version : 0.11.1

  • Spark version : 3.1.1

  • Hive version : 3.1.3

  • Hadoop version : 3.1.1

  • Storage (HDFS/S3/GCS…) : HDFS

  • Running on Docker? (yes/no) : no -> yarn on cloudera cdp 7.1.7

Additional context

. Example write:

df.write.format("hudi")
  .option(HIVE_CREATE_MANAGED_TABLE.key(), false)
  .option(HIVE_DATABASE.key(), "db_demo")
  .option(HIVE_SYNC_ENABLED.key(), true)
  .option(HIVE_SYNC_MODE.key(), "HMS")
  .option(HIVE_TABLE.key(), "ht_hudi_11_1_metadata")
  .option("hoodie.table.name", "ht_hudi_11_1_metadata")
  .option(KEYGENERATOR_CLASS_NAME.key(), "org.apache.hudi.keygen.NonpartitionedKeyGenerator")
  .option(OPERATION.key(), "upsert")
  .option(PRECOMBINE_FIELD.key(), "sequence")
  .option(RECORDKEY_FIELD.key(), "id")
  .option(TABLE_NAME.key(), "ht_hudi_11_1_metadata")
  .option("hoodie.index.type","BLOOM")
  .option("hoodie.metadata.enable", true)
  .mode("append")
  .save("hdfs:///.../hudi_11_1_metadata")

Stacktrace

Caused by: java.lang.ExceptionInInitializerError
        at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
        at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
        at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
        at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
        at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
        at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
        at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
        ... 28 more
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (2.2.3.7.1.7.0-551), this version is 2.4.9
        at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:74)
        at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:84)
        at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:98)
        at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Context.<init>(Context.java:44)
        at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<init>(Encryption.java:110)
        at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<clinit>(Encryption.java:107)
        ... 36 more
........
22/08/12 08:19:20 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 9) (hdl-w05.charite.de executor 1): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
        at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
        at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context
        at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
        at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
        at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
        at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
        at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
        at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
        at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:21 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
rbtrtrcommented, Sep 19, 2022

Thanks all, finally the issue was that cloudera provides the configuration for the parameter hbase.defaults.for.version.skip. After overwriting the parameter within cloudera manager it works.

1reaction
rbtrtrcommented, Aug 16, 2022

Thanks for your response, I already saw this in the pom file. Thats why we started our spark-shell with the hbase jars in this version, but unfortunately with no success. I guess they are the last one on the classpath.

I was wondering why the hbase lib is mandatory for the metadate table. Where is the metadata stored when we’re setting this option? .option("hoodie.index.type","BLOOM")

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache HBase ™ Reference Guide
Commercial technical support for Apache HBase is provided by many Hadoop ... Determines the type of memstore to be used for system tables...
Read more >
Hbase Soket TimeOut Exception - Cloudera Community - 34858
I'm seeing a similar problem: HBase throws a error when we query one large table. The error is as follows: ERROR: Call id=58,...
Read more >
HBase throws TableNotFoundException when the table exists
listTableNames() println(s"HBase has the following tables: ${tableNames.map(_. ... User class threw exception: org.apache.hadoop.ipc.
Read more >
ERROR: "HBase Table 'XYZ' was not found, got - Search
Axon scanner fails with the following exception in a Kerberos enabled Catalog Service. Table 'hbase:acl' was not found, got: SYSTEM:STATS.
Read more >
How Do I Set the TTL for an HBase Table?_ ... - 华为云
Set the time to live (TTL) when creating a table:Create the t_task_log table, set the column family to f, and set the TTL...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found