question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

See original GitHub issue

Describe the problem you faced

Using DynamoDB as the lock provider for concurrent writes results in an error if hoodie.write.lock.dynamodb.endpoint_url is not provided when using Hudi 0.10.1.

The documentation says this option is present from 0.11.0, and should be optional. Providing my region’s DynamoDB endpoint as the option value works, but this behaviour is unexpected.

To Reproduce

Steps to reproduce the behavior:

  1. Build Hudi from 0.10.1 source files
  2. Provide the following Hudi write options as part of a PySpark script: 'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider', 'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>', 'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>'

Expected behavior

Table created in DynamoDB to provide locking functionality for concurrent writes.

Environment Description

  • Hudi version : 0.10.1

  • Spark version : 3.1.1

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : No

Additional context

PySpark application running as a Glue ETL job. Once the appropriate endpoint URL is added to the options, the lock table is created as expected.

Stacktrace

org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91)
	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
	at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
	at org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
	at org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
	at org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
	at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
	at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
	... 47 more
Caused by: java.lang.IllegalArgumentException: Property hoodie.write.lock.dynamodb.endpoint_url not found
	at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:48)
	at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:58)
	at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.getDynamoDBClient(DynamoDBBasedLockProvider.java:159)
	at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:87)
	at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:77)
	... 52 more

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
parisnicommented, Mar 6, 2022

@lewyh @nsivabalan indeed i have spotted an error in my pr. will propose a fix asap

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [hudi] nsivabalan commented on issue #4904: [SUPPORT ...
[GitHub] [hudi] nsivabalan commented on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided.
Read more >
Concurrency Control - Apache Hudi
By default, hoodie.write.lock.dynamodb.partition_key is set to the table name, so that multiple writers writing to the same table share the same lock. If...
Read more >
All Configurations | Apache Hudi
This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...
Read more >
FAQs - Apache Hudi
If you are looking to quickly ingest data onto HDFS or cloud storage, Hudi can provide you tools to help. Also, if you...
Read more >
Configurations - Apache Hudi
Hive table name, to register the table into. OPERATION_OPT_KEY​. Property: hoodie.datasource.write.operation , Default: upsert whether to do upsert, insert or ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found