Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException

See original GitHub issue

Describe the problem you faced

Using DynamoDB as the lock provider for concurrent writes results in an error stating java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException

To Reproduce

Steps to reproduce the behaviour:

Build Hudi from 0.10.1 source files
Provide the following Hudi write options as part of a PySpark script: ‘hoodie.write.concurrency.mode’: ‘optimistic_concurrency_control’, ‘hoodie.cleaner.policy.failed.writes’: ‘LAZY’, ‘hoodie.write.lock.provider’: ‘org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider’, ‘hoodie.write.lock.dynamodb.table’: ‘<TABLE_NAME>’, ‘hoodie.write.lock.dynamodb.partition_key’: ‘<KEY_NAME>’

Expected behavior

Job is able to acquire lock.

Environment Description

Hudi version : 0.10.1
Spark version : 3.1.2
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS…) : S3
Running on Docker? (yes/no) : no

Additional context

Using on Glue 3.0. Dynamo DB table is already created manually and role assigned to the job has all the permissions to operate on the table.

'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
'hoodie.cleaner.policy.failed.writes': 'LAZY',
'hoodie.write.lock.dynamodb.endpoint_url': 'dynamodb.us-east-1.amazonaws.com',
'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>',
'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>',
'hoodie.write.lock.dynamodb.region': 'us-east-1',

Tried both with and without providing “hoodie.write.lock.dynamodb.endpoint_url”

Jars included:

extra-jars/hudi-spark3.1.2-bundle_2.12-0.10.1.jar extra-jars/spark-avro_2.12-3.1.2.jar

Job runs fine without concurrency mode configurations.

Stacktrace

2022-04-27 14:13:05,812 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
  File "/tmp/glue_process_bundle.py", line 17, in <module>
    start_process(glue_ctx, config, glue_catalog_svc)
  File "/tmp/glue_process_bundle.zip/jobs/process.py", line 180, in start_signal_process
    load(final_df, config)
  File "/tmp/glue_process_bundle.zip/jobs/process.py", line 99, in load
    df.write.format("hudi").options(**hudi_options).mode("append").save(config.params.processed_bucket)
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1109, in save
    self._jwrite.save(path)
  File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o255.save.
: java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:54)
	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
	at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
	at org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
	at org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
	at org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
	at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
	at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.dynamodbv2.model.LockNotGrantedException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 51 more

Since this is NoClassDefFoundError, was wondering if there are some additional sdk jars that I need to include to use this functionality?

Thanks.

Issue Analytics

State:
Created a year ago
Comments:15 (10 by maintainers)

Top GitHub Comments

2reactions

umehrot2commented, May 2, 2022

Created a couple of Jiras to improve the naming/docs and to package the jars in a bundle so customers don’t have to pass these AWS specific jars manually:

https://issues.apache.org/jira/browse/HUDI-4011 https://issues.apache.org/jira/browse/HUDI-4010

0reactions

jdattanicommented, May 4, 2022

@yihua @umehrot2 The issue was resolved after defining dynamodb partition key name to “key”. Thanks for your inputs.

Top Results From Across the Web

[GitHub] [hudi] jdattani commented on issue #5451

... on issue #5451: [SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/ ...

Resolve parquet-avro conflict in hudi-gcp-bundle and hudi ...

HoodieException : org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: ...

5 - Stack Overflow

I am trying to read data from hudi but getting below error. Caused by: java.lang.ClassNotFoundException: Failed to find data source: hudi.

Hudi - Amazon EMR - AWS Documentation

Hudi allows data to be ingested and updated in near real time. ... performed on the dataset to help ensure that the actions...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

[SUPPORT] Flink Date type as partition field