S3 folder paths messed up when running from Windows
See original GitHub issueI am running my code from Windows machine to push data to S3. When i am trying to write the data i am having an error where stats were not able to be found as i am passing stats as null in
public SizeAwareFSDataOutputStream(FSDataOutputStream out, Runnable
closeCallback)
throws IOException {
super(out, null);
this.closeCallback = closeCallback;
}
The problem is due to this failure the cleanFailedWrites method is failing. This is expecting the path to be Linux based.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/HariprasadAllaka/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/HariprasadAllaka/.m2/repository/com/github/HariprasadAllaka1612/incubator-hudi/hudi-timeline-server-bundle/playngoplatform-hoodie-0.4.7-gcde16ad-114/hudi-timeline-server-bundle-playngoplatform-hoodie-0.4.7-gcde16ad-114.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to complete commit 20190918145332 due to finalize errors.
at org.apache.hudi.HoodieWriteClient.finalizeWrite(HoodieWriteClient.java:1312)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:529)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:501)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at com.playngodataengg.scala.dao.DataAccessS3.writeDataToRefinedHudiS3(DataAccessS3.scala:38)
at com.playngodataengg.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:48)
at com.playngodataengg.scala.action.GameAndProviderData$.main(GameAndProviderData.scala:10)
at com.playngodataengg.scala.action.GameAndProviderData.main(GameAndProviderData.scala)
... 5 more
Caused by: org.apache.hudi.exception.HoodieIOException: No such file or directory: s3a://gat-datalake-raw-dev/Games2/.hoodie/.temp/20190918145332/asp
at org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:391)
at org.apache.hudi.table.HoodieTable.finalizeWrite(HoodieTable.java:295)
at org.apache.hudi.table.HoodieMergeOnReadTable.finalizeWrite(HoodieMergeOnReadTable.java:331)
at org.apache.hudi.HoodieWriteClient.finalizeWrite(HoodieWriteClient.java:1303)
... 35 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://gat-datalake-raw-dev/Games2/.hoodie/.temp/20190918145332/asp
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2269)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:3101)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:3082)
at org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.listFiles(HoodieWrapperFileSystem.java:531)
at org.apache.hudi.common.util.FSUtils.processFiles(FSUtils.java:245)
at org.apache.hudi.common.util.FSUtils.getAllDataFilesForMarkers(FSUtils.java:213)
at org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:340)
... 38 more
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
Windows-style backslashes appear in s3 paths when ... - GitHub
Running into this issue on Windows and the suggestion from javi7 hasn't worked. Does anyone have a tested method for making this work...
Read more >S3 keys are not file paths - alexwlchan
The problem. Most filesystems have special path entries that mean “the current directory” and “the parent directory” – on Unix-like systems, ...
Read more >AWS Cli in Windows wont upload file to s3 bucket
I've been beating my head against running this command with every combination of ... aws s3 cp /home/<username>/folder/ s3://<bucketID>/<username>/archive/ ...
Read more >Organizing objects in the Amazon S3 console using folders
Use the Amazon S3 console to create folders that you can use to group your objects.
Read more >How to Mount Amazon S3 as a Filesystem in Linux, Windows ...
Rclone is now configured to work with Amazon S3 cloud storage. Make sure you have the correct date and time settings on your...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
+1 then please open a JIRA for windows support and we can continue there. I don’t think we have ever tested on windows…
@HariprasadAllaka1612 : Can you run Hudi unit tests (mvn test) in your windows setup (without S3) and see if all the tests are passing. This way, it would be easier to catch a broader range of issues that going piece-meal