question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Issues when writing dataframe to hudi format with hive syncing enabled for AWS Athena and Glue metadata persistence

See original GitHub issue

Detailed Description Facing issues when trying to write data to hudi format in S3 location, with following hudi write options

hudi_write_table_options = {
        "hoodie.table.name": "hudi_data_test",
        "hoodie.datasource.write.table.type": write_type,
        "hoodie.datasource.write.storage.type": write_type,
        "hoodie.datasource.write.recordkey.field": ",".join(primary_keys),
        "hoodie.datasource.write.partitionpath.field": ','.join(partition_keys),
        "hoodie.datasource.write.precombine.field": ','.join(precombine_key),
        "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
        "hoodie.datasource.write.operation": insert_type,
        "hoodie.consistency.check.enabled": "true",
        "hoodie.datasource.write.hive_style_partitioning": "true",
        "hoodie.datasource.hive_sync.enable": "true",
        "hoodie.datasource.hive_sync.auto_create_database":"true",
        "hoodie.datasource.hive_sync.database":"hudidatalake",
        "hoodie.datasource.hive_sync.table": "hudi_data_test",
        "hoodie.datasource.hive_sync.partition_fields": ','.join(partition_keys),
        'hoodie.datasource.hive_sync.jdbcurl':"jdbc:hive2://localhost:10000",
        "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor"
    }

It worked a day back perfectly with data synchronized perfectly, and we can query datasets from AWS Athena but today exceptions are throwing from the app, below is the exception and full stacktrace

Environment Description on EMR

  • Hudi version : 0.9.0 (hudi-spark3-bundle_2.12-0.9.0.jar)

  • Spark version : 3.1.2

  • Hive version : Hive 3.1.2-amzn-5

  • Hadoop version : Hadoop 3.2.1-amzn-4

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : no

Additional context

we are using spark-submit to submit spark applications to EMR in cluster mode which has the logic of writing data to hudi format.

Stacktrace

Scraped through YARN logs I could extract below exceptions

ERROR HoodieTable: Got exception while waiting for files to show up
java.util.concurrent.TimeoutException: Timed out waiting for files to adhere to event APPEAR
	at org.apache.hudi.common.fs.FailSafeConsistencyGuard.retryTillSuccess(FailSafeConsistencyGuard.java:163)
	at org.apache.hudi.common.fs.FailSafeConsistencyGuard.waitForFilesVisibility(FailSafeConsistencyGuard.java:84)
	at org.apache.hudi.common.fs.FailSafeConsistencyGuard.waitTillAllFilesAppear(FailSafeConsistencyGuard.java:65)
	at org.apache.hudi.common.fs.ConsistencyGuard.waitTill(ConsistencyGuard.java:80)
	at org.apache.hudi.table.HoodieTable.waitForCondition(HoodieTable.java:578)
	at org.apache.hudi.table.HoodieTable.lambda$waitForAllFiles$6537235c$1(HoodieTable.java:566)
	at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at scala.collection.AbstractIterator.to(Iterator.scala:1429)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1429)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2281)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

followed by below 503 exceptions

ERROR BulkInsertDataInternalWriterHelper: Global error thrown while trying to write records in HoodieRowCreateHandle 
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3ZPMHVXR7ACCC6; S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=; Proxy: null), S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:26)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:12)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor$CallPerformer.call(GlobalS3Executor.java:108)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:135)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:191)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:186)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getS3ObjectMetadata(ConsistencyCheckerS3FileSystem.java:947)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatusFromS3CheckingConsistencyIfEnabled(ConsistencyCheckerS3FileSystem.java:506)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:443)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:436)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdir(ConsistencyCheckerS3FileSystem.java:755)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdirs(ConsistencyCheckerS3FileSystem.java:747)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.lambda$newMetadataAdder$0(ConsistencyCheckerS3FileSystem.java:222)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.addParentDirectoriesMetadata(MetadataAdder.java:87)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.afterUploadCompletion(MetadataAdder.java:70)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.ChainedUploadObserver.afterUploadCompletion(ChainedUploadObserver.java:25)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.DefaultSinglePartUploadDispatcher.create(DefaultSinglePartUploadDispatcher.java:44)
	at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.uploadSingleCompleteFile(S3FSOutputStream.java:386)
	at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.doClose(S3FSOutputStream.java:225)
	at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:201)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
	at org.apache.hudi.common.fs.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:75)
	at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:200)
	at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:181)
	at org.apache.hudi.table.marker.WriteMarkers.create(WriteMarkers.java:65)
	at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.createMarkerFile(HoodieRowCreateHandle.java:191)
	at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.<init>(HoodieRowCreateHandle.java:98)
	at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.getRowCreateHandle(BulkInsertDataInternalWriterHelper.java:165)
	at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.write(BulkInsertDataInternalWriterHelper.java:141)
	at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:48)
	at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:35)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/10/07 07:13:12 ERROR Utils: Aborting task
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3ZPMHVXR7ACCC6; S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=; Proxy: null), S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:26)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:12)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor$CallPerformer.call(GlobalS3Executor.java:108)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:135)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:191)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:186)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getS3ObjectMetadata(ConsistencyCheckerS3FileSystem.java:947)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatusFromS3CheckingConsistencyIfEnabled(ConsistencyCheckerS3FileSystem.java:506)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:443)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:436)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdir(ConsistencyCheckerS3FileSystem.java:755)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdirs(ConsistencyCheckerS3FileSystem.java:747)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.lambda$newMetadataAdder$0(ConsistencyCheckerS3FileSystem.java:222)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.addParentDirectoriesMetadata(MetadataAdder.java:87)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.afterUploadCompletion(MetadataAdder.java:70)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.ChainedUploadObserver.afterUploadCompletion(ChainedUploadObserver.java:25)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.DefaultSinglePartUploadDispatcher.create(DefaultSinglePartUploadDispatcher.java:44)
	at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.uploadSingleCompleteFile(S3FSOutputStream.java:386)
	at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.doClose(S3FSOutputStream.java:225)
	at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:201)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
	at org.apache.hudi.common.fs.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:75)
	at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:200)
	at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:181)
	at org.apache.hudi.table.marker.WriteMarkers.create(WriteMarkers.java:65)
	at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.createMarkerFile(HoodieRowCreateHandle.java:191)
	at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.<init>(HoodieRowCreateHandle.java:98)
	at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.getRowCreateHandle(BulkInsertDataInternalWriterHelper.java:165)
	at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.write(BulkInsertDataInternalWriterHelper.java:141)
	at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:48)
	at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:35)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/10/07 07:13:12 ERROR DataWritingSparkTask: Aborting commit for partition 219 (task 527, attempt 0, stage 3.0)
21/10/07 07:13:12 ERROR DataWritingSparkTask: Aborted commit for partition 219 (task 527, attempt 0, stage 3.0)
21/10/07 07:13:12 WARN HoodiePartitionMetadata: Error trying to clean up temporary files for s3://bucket-name/tablename/partitionkey=18455
java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3JTH3ZZNF1S6GV; S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=; Proxy: null), S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=
	at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:230)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1690)
	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:436)
	at org.apache.hudi.common.fs.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:549)
	at org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:110)
	at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.<init>(HoodieRowCreateHandle.java:97)
	at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.getRowCreateHandle(BulkInsertDataInternalWriterHelper.java:165)
	at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.write(BulkInsertDataInternalWriterHelper.java:141)
	at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:48)
	at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:35)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3JTH3ZZNF1S6GV; S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=; Proxy: null), S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:26)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:12)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor$CallPerformer.call(GlobalS3Executor.java:108)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:135)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:191)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:186)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getS3ObjectMetadata(ConsistencyCheckerS3FileSystem.java:947)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatusFromS3CheckingConsistencyIfEnabled(ConsistencyCheckerS3FileSystem.java:478)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:443)
	at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:436)
	at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy39.getFileStatus(Unknown Source)
	at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:219)
	... 21 more

Is this a transient issue or is there some configuration to enable retries ?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
absognetycommented, Oct 26, 2021

Closing this, we stopped encountering this issue after regulating the parallelism and creating more buckets instead of having single bucket for our application, also added few cluster level configurations suggested from above link.

1reaction
nsivabalancommented, Oct 14, 2021

@bhasudha @umehrot2 : any suggestions on how to circumvent S3 throttling issue ? appreciate your inputs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[SUPPORT] AWS Glue 3.0 fail to write dataset with hudi (hive ...
Describe the problem you faced I was trying to use hudi with AWS Glue At first, i create a simple dataframe from pyspark.sql...
Read more >
Using Athena to query Apache Hudi datasets
Data sets managed by Hudi are stored in S3 using open storage formats. Currently, Athena can read compacted Hudi datasets but not write...
Read more >
Using the Hudi framework in AWS Glue
Describes the settings available for interacting with data using the Hudi framework in AWS Glue.
Read more >
Considerations and limitations for using Hudi on Amazon EMR
For Presto to correctly interpret Hudi dataset columns, set the hive.parquet_use_column_names value to true . To set the value for a session, in...
Read more >
Writing to Apache Hudi tables using AWS Glue Custom ...
In this post, we create a Hudi table with an initial load of over 200 million records and then update 70 million of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found