[SUPPORT] Issues when writing dataframe to hudi format with hive syncing enabled for AWS Athena and Glue metadata persistence
See original GitHub issueDetailed Description Facing issues when trying to write data to hudi format in S3 location, with following hudi write options
hudi_write_table_options = {
"hoodie.table.name": "hudi_data_test",
"hoodie.datasource.write.table.type": write_type,
"hoodie.datasource.write.storage.type": write_type,
"hoodie.datasource.write.recordkey.field": ",".join(primary_keys),
"hoodie.datasource.write.partitionpath.field": ','.join(partition_keys),
"hoodie.datasource.write.precombine.field": ','.join(precombine_key),
"hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.datasource.write.operation": insert_type,
"hoodie.consistency.check.enabled": "true",
"hoodie.datasource.write.hive_style_partitioning": "true",
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.datasource.hive_sync.auto_create_database":"true",
"hoodie.datasource.hive_sync.database":"hudidatalake",
"hoodie.datasource.hive_sync.table": "hudi_data_test",
"hoodie.datasource.hive_sync.partition_fields": ','.join(partition_keys),
'hoodie.datasource.hive_sync.jdbcurl':"jdbc:hive2://localhost:10000",
"hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor"
}
It worked a day back perfectly with data synchronized perfectly, and we can query datasets from AWS Athena but today exceptions are throwing from the app, below is the exception and full stacktrace
Environment Description on EMR
-
Hudi version : 0.9.0 (hudi-spark3-bundle_2.12-0.9.0.jar)
-
Spark version : 3.1.2
-
Hive version : Hive 3.1.2-amzn-5
-
Hadoop version : Hadoop 3.2.1-amzn-4
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : no
Additional context
we are using spark-submit
to submit spark applications to EMR in cluster mode which has the logic of writing data to hudi format.
Stacktrace
Scraped through YARN logs I could extract below exceptions
ERROR HoodieTable: Got exception while waiting for files to show up
java.util.concurrent.TimeoutException: Timed out waiting for files to adhere to event APPEAR
at org.apache.hudi.common.fs.FailSafeConsistencyGuard.retryTillSuccess(FailSafeConsistencyGuard.java:163)
at org.apache.hudi.common.fs.FailSafeConsistencyGuard.waitForFilesVisibility(FailSafeConsistencyGuard.java:84)
at org.apache.hudi.common.fs.FailSafeConsistencyGuard.waitTillAllFilesAppear(FailSafeConsistencyGuard.java:65)
at org.apache.hudi.common.fs.ConsistencyGuard.waitTill(ConsistencyGuard.java:80)
at org.apache.hudi.table.HoodieTable.waitForCondition(HoodieTable.java:578)
at org.apache.hudi.table.HoodieTable.lambda$waitForAllFiles$6537235c$1(HoodieTable.java:566)
at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
at scala.collection.AbstractIterator.to(Iterator.scala:1429)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1429)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2281)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
followed by below 503 exceptions
ERROR BulkInsertDataInternalWriterHelper: Global error thrown while trying to write records in HoodieRowCreateHandle
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3ZPMHVXR7ACCC6; S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=; Proxy: null), S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:26)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:12)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor$CallPerformer.call(GlobalS3Executor.java:108)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:135)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:191)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:186)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getS3ObjectMetadata(ConsistencyCheckerS3FileSystem.java:947)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatusFromS3CheckingConsistencyIfEnabled(ConsistencyCheckerS3FileSystem.java:506)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:443)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:436)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdir(ConsistencyCheckerS3FileSystem.java:755)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdirs(ConsistencyCheckerS3FileSystem.java:747)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.lambda$newMetadataAdder$0(ConsistencyCheckerS3FileSystem.java:222)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.addParentDirectoriesMetadata(MetadataAdder.java:87)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.afterUploadCompletion(MetadataAdder.java:70)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.ChainedUploadObserver.afterUploadCompletion(ChainedUploadObserver.java:25)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.DefaultSinglePartUploadDispatcher.create(DefaultSinglePartUploadDispatcher.java:44)
at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.uploadSingleCompleteFile(S3FSOutputStream.java:386)
at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.doClose(S3FSOutputStream.java:225)
at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:201)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
at org.apache.hudi.common.fs.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:75)
at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:200)
at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:181)
at org.apache.hudi.table.marker.WriteMarkers.create(WriteMarkers.java:65)
at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.createMarkerFile(HoodieRowCreateHandle.java:191)
at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.<init>(HoodieRowCreateHandle.java:98)
at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.getRowCreateHandle(BulkInsertDataInternalWriterHelper.java:165)
at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.write(BulkInsertDataInternalWriterHelper.java:141)
at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:48)
at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:35)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/10/07 07:13:12 ERROR Utils: Aborting task
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3ZPMHVXR7ACCC6; S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=; Proxy: null), S3 Extended Request ID: xJVRCXk4gMFkuG2q+4s9Z/f14VYDebtjA+tYvWL6Depi4gG3KjEvOOKtz6iMleZcse4S/nCKOzM=
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:26)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:12)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor$CallPerformer.call(GlobalS3Executor.java:108)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:135)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:191)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:186)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getS3ObjectMetadata(ConsistencyCheckerS3FileSystem.java:947)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatusFromS3CheckingConsistencyIfEnabled(ConsistencyCheckerS3FileSystem.java:506)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:443)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:436)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdir(ConsistencyCheckerS3FileSystem.java:755)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.mkdirs(ConsistencyCheckerS3FileSystem.java:747)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.lambda$newMetadataAdder$0(ConsistencyCheckerS3FileSystem.java:222)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.addParentDirectoriesMetadata(MetadataAdder.java:87)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.MetadataAdder.afterUploadCompletion(MetadataAdder.java:70)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.ChainedUploadObserver.afterUploadCompletion(ChainedUploadObserver.java:25)
at com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.DefaultSinglePartUploadDispatcher.create(DefaultSinglePartUploadDispatcher.java:44)
at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.uploadSingleCompleteFile(S3FSOutputStream.java:386)
at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.doClose(S3FSOutputStream.java:225)
at com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:201)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:73)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:102)
at org.apache.hudi.common.fs.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:75)
at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:200)
at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:181)
at org.apache.hudi.table.marker.WriteMarkers.create(WriteMarkers.java:65)
at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.createMarkerFile(HoodieRowCreateHandle.java:191)
at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.<init>(HoodieRowCreateHandle.java:98)
at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.getRowCreateHandle(BulkInsertDataInternalWriterHelper.java:165)
at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.write(BulkInsertDataInternalWriterHelper.java:141)
at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:48)
at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:35)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/10/07 07:13:12 ERROR DataWritingSparkTask: Aborting commit for partition 219 (task 527, attempt 0, stage 3.0)
21/10/07 07:13:12 ERROR DataWritingSparkTask: Aborted commit for partition 219 (task 527, attempt 0, stage 3.0)
21/10/07 07:13:12 WARN HoodiePartitionMetadata: Error trying to clean up temporary files for s3://bucket-name/tablename/partitionkey=18455
java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3JTH3ZZNF1S6GV; S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=; Proxy: null), S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=
at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:230)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1690)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:436)
at org.apache.hudi.common.fs.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:549)
at org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:110)
at org.apache.hudi.io.storage.row.HoodieRowCreateHandle.<init>(HoodieRowCreateHandle.java:97)
at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.getRowCreateHandle(BulkInsertDataInternalWriterHelper.java:165)
at org.apache.hudi.internal.BulkInsertDataInternalWriterHelper.write(BulkInsertDataInternalWriterHelper.java:141)
at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:48)
at org.apache.hudi.spark3.internal.HoodieBulkInsertDataInternalWriter.write(HoodieBulkInsertDataInternalWriter.java:35)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: FS3JTH3ZZNF1S6GV; S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=; Proxy: null), S3 Extended Request ID: a4sRaJ/WD4WFjxK+sGvHimrthtyxBNyHcMCDmwt+e/vh9snondkR/oeuQEFOno5d7HZ/6o2Trro=
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:26)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:12)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor$CallPerformer.call(GlobalS3Executor.java:108)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:135)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:191)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:186)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getS3ObjectMetadata(ConsistencyCheckerS3FileSystem.java:947)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatusFromS3CheckingConsistencyIfEnabled(ConsistencyCheckerS3FileSystem.java:478)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:443)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:436)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy39.getFileStatus(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:219)
... 21 more
Is this a transient issue or is there some configuration to enable retries ?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Closing this, we stopped encountering this issue after regulating the parallelism and creating more buckets instead of having single bucket for our application, also added few cluster level configurations suggested from above link.
@bhasudha @umehrot2 : any suggestions on how to circumvent S3 throttling issue ? appreciate your inputs.