[SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
See original GitHub issueDescribe the problem you faced
Using DynamoDB as the lock provider for concurrent writes results in an error stating java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
To Reproduce
Steps to reproduce the behaviour:
-
Build Hudi from 0.10.1 source files
-
Provide the following Hudi write options as part of a PySpark script: ‘hoodie.write.concurrency.mode’: ‘optimistic_concurrency_control’, ‘hoodie.cleaner.policy.failed.writes’: ‘LAZY’, ‘hoodie.write.lock.provider’: ‘org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider’, ‘hoodie.write.lock.dynamodb.table’: ‘<TABLE_NAME>’, ‘hoodie.write.lock.dynamodb.partition_key’: ‘<KEY_NAME>’
Expected behavior
Job is able to acquire lock.
Environment Description
-
Hudi version : 0.10.1
-
Spark version : 3.1.2
-
Hive version :
-
Hadoop version :
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : no
Additional context
Using on Glue 3.0. Dynamo DB table is already created manually and role assigned to the job has all the permissions to operate on the table.
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
'hoodie.cleaner.policy.failed.writes': 'LAZY',
'hoodie.write.lock.dynamodb.endpoint_url': 'dynamodb.us-east-1.amazonaws.com',
'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>',
'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>',
'hoodie.write.lock.dynamodb.region': 'us-east-1',
Tried both with and without providing “hoodie.write.lock.dynamodb.endpoint_url”
Jars included:
extra-jars/hudi-spark3.1.2-bundle_2.12-0.10.1.jar extra-jars/spark-avro_2.12-3.1.2.jar
Job runs fine without concurrency mode configurations.
Stacktrace
2022-04-27 14:13:05,812 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/glue_process_bundle.py", line 17, in <module>
start_process(glue_ctx, config, glue_catalog_svc)
File "/tmp/glue_process_bundle.zip/jobs/process.py", line 180, in start_signal_process
load(final_df, config)
File "/tmp/glue_process_bundle.zip/jobs/process.py", line 99, in load
df.write.format("hudi").options(**hudi_options).mode("append").save(config.params.processed_bucket)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1109, in save
self._jwrite.save(path)
File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o255.save.
: java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:54)
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
at org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
at org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
at org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.dynamodbv2.model.LockNotGrantedException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 51 more
Since this is NoClassDefFoundError, was wondering if there are some additional sdk jars that I need to include to use this functionality?
Thanks.
Issue Analytics
- State:
- Created a year ago
- Comments:15 (10 by maintainers)
Top GitHub Comments
Created a couple of Jiras to improve the naming/docs and to package the jars in a bundle so customers don’t have to pass these AWS specific jars manually:
https://issues.apache.org/jira/browse/HUDI-4011 https://issues.apache.org/jira/browse/HUDI-4010
@yihua @umehrot2 The issue was resolved after defining dynamodb partition key name to “key”. Thanks for your inputs.