question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 folder paths messed up when running from Windows

See original GitHub issue

I am running my code from Windows machine to push data to S3. When i am trying to write the data i am having an error where stats were not able to be found as i am passing stats as null in

public SizeAwareFSDataOutputStream(FSDataOutputStream out, Runnable 
closeCallback)
      throws IOException {
    super(out, null);
    this.closeCallback = closeCallback;
  }

The problem is due to this failure the cleanFailedWrites method is failing. This is expecting the path to be Linux based.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/HariprasadAllaka/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/HariprasadAllaka/.m2/repository/com/github/HariprasadAllaka1612/incubator-hudi/hudi-timeline-server-bundle/playngoplatform-hoodie-0.4.7-gcde16ad-114/hudi-timeline-server-bundle-playngoplatform-hoodie-0.4.7-gcde16ad-114.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to complete commit 20190918145332 due to finalize errors.
	at org.apache.hudi.HoodieWriteClient.finalizeWrite(HoodieWriteClient.java:1312)
	at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:529)
	at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
	at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:501)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
	at com.playngodataengg.scala.dao.DataAccessS3.writeDataToRefinedHudiS3(DataAccessS3.scala:38)
	at com.playngodataengg.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:48)
	at com.playngodataengg.scala.action.GameAndProviderData$.main(GameAndProviderData.scala:10)
	at com.playngodataengg.scala.action.GameAndProviderData.main(GameAndProviderData.scala)
	... 5 more
Caused by: org.apache.hudi.exception.HoodieIOException: No such file or directory: s3a://gat-datalake-raw-dev/Games2/.hoodie/.temp/20190918145332/asp
	at org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:391)
	at org.apache.hudi.table.HoodieTable.finalizeWrite(HoodieTable.java:295)
	at org.apache.hudi.table.HoodieMergeOnReadTable.finalizeWrite(HoodieMergeOnReadTable.java:331)
	at org.apache.hudi.HoodieWriteClient.finalizeWrite(HoodieWriteClient.java:1303)
	... 35 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://gat-datalake-raw-dev/Games2/.hoodie/.temp/20190918145332/asp
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2269)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:3101)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:3082)
	at org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.listFiles(HoodieWrapperFileSystem.java:531)
	at org.apache.hudi.common.util.FSUtils.processFiles(FSUtils.java:245)
	at org.apache.hudi.common.util.FSUtils.getAllDataFilesForMarkers(FSUtils.java:213)
	at org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:340)
	... 38 more

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
vinothchandarcommented, Sep 18, 2019

+1 then please open a JIRA for windows support and we can continue there. I don’t think we have ever tested on windows…

1reaction
bvaradarcommented, Sep 18, 2019

@HariprasadAllaka1612 : Can you run Hudi unit tests (mvn test) in your windows setup (without S3) and see if all the tests are passing. This way, it would be easier to catch a broader range of issues that going piece-meal

Read more comments on GitHub >

github_iconTop Results From Across the Web

Windows-style backslashes appear in s3 paths when ... - GitHub
Running into this issue on Windows and the suggestion from javi7 hasn't worked. Does anyone have a tested method for making this work...
Read more >
S3 keys are not file paths - alexwlchan
The problem. Most filesystems have special path entries that mean “the current directory” and “the parent directory” – on Unix-like systems, ...
Read more >
AWS Cli in Windows wont upload file to s3 bucket
I've been beating my head against running this command with every combination of ... aws s3 cp /home/<username>/folder/ s3://<bucketID>/<username>/archive/ ...
Read more >
Organizing objects in the Amazon S3 console using folders
Use the Amazon S3 console to create folders that you can use to group your objects.
Read more >
How to Mount Amazon S3 as a Filesystem in Linux, Windows ...
Rclone is now configured to work with Amazon S3 cloud storage. Make sure you have the correct date and time settings on your...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found