[SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided
See original GitHub issueDescribe the problem you faced
Using DynamoDB as the lock provider for concurrent writes results in an error if hoodie.write.lock.dynamodb.endpoint_url
is not provided when using Hudi 0.10.1.
The documentation says this option is present from 0.11.0, and should be optional. Providing my region’s DynamoDB endpoint as the option value works, but this behaviour is unexpected.
To Reproduce
Steps to reproduce the behavior:
- Build Hudi from 0.10.1 source files
- Provide the following Hudi write options as part of a PySpark script:
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider', 'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>', 'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>'
Expected behavior
Table created in DynamoDB to provide locking functionality for concurrent writes.
Environment Description
-
Hudi version : 0.10.1
-
Spark version : 3.1.1
-
Hive version :
-
Hadoop version :
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : No
Additional context
PySpark application running as a Glue ETL job. Once the appropriate endpoint URL is added to the options, the lock table is created as expected.
Stacktrace
org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91)
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
at org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
at org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
at org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
... 47 more
Caused by: java.lang.IllegalArgumentException: Property hoodie.write.lock.dynamodb.endpoint_url not found
at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:48)
at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:58)
at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.getDynamoDBClient(DynamoDBBasedLockProvider.java:159)
at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:87)
at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:77)
... 52 more
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
@lewyh @nsivabalan indeed i have spotted an error in my pr. will propose a fix asap
They can copy that file in their project to overwrite it until merged https://github.com/apache/hudi/blob/21b218f9569a60c94aa2155b2093205382da40d3/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java