question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem with special characters in file path

See original GitHub issue

Hi Ruslan

I am a colleague of @wrangel. We have problems with special characters in the path of data files (anything escaped in a URI). Here is the full stack trace:

java.io.FileNotFoundException: File /mnt/landingzone/source/daily/1007_rs/2019-09-20/with%20space does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:539) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:752) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:529) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) at za.co.absa.cobrix.spark.cobol.utils.HDFSUtils$.getBlocksLocations(HDFSUtils.scala:56) at za.co.absa.cobrix.spark.cobol.utils.HDFSUtils$.getBlocksLocations(HDFSUtils.scala:37) at za.co.absa.cobrix.spark.cobol.source.index.IndexBuilder$$anonfun$2.apply(IndexBuilder.scala:146) at za.co.absa.cobrix.spark.cobol.source.index.IndexBuilder$$anonfun$2.apply(IndexBuilder.scala:145) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at za.co.absa.cobrix.spark.cobol.source.index.IndexBuilder$.toRDDWithLocality(IndexBuilder.scala:145) at za.co.absa.cobrix.spark.cobol.source.index.IndexBuilder$.buildIndexForVarLenReaderWithFullLocality(IndexBuilder.scala:69) at za.co.absa.cobrix.spark.cobol.source.index.IndexBuilder$.buildIndex(IndexBuilder.scala:50) at za.co.absa.cobrix.spark.cobol.source.CobolRelation.indexes$lzycompute(CobolRelation.scala:80) at za.co.absa.cobrix.spark.cobol.source.CobolRelation.indexes(CobolRelation.scala:80) at za.co.absa.cobrix.spark.cobol.source.CobolRelation.buildScan(CobolRelation.scala:92) at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:308) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77) at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:100) at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:67) at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:91) at org.apache.spark.sql.Dataset.persist(Dataset.scala:2968) at org.apache.spark.sql.Dataset.cache(Dataset.scala:2978) at ch.swisscard.bigdataanalyticspoc.app.stage1.cobolparser.CobolFileReader.run(CobolFileReader.scala:54) at ch.swisscard.bigdataanalyticspoc.app.stage1.cobolparser.CobolParser.parse(CobolParser.scala:18) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$$anonfun$1.apply(Stage1.scala:53) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$$anonfun$1.apply(Stage1.scala:49) at scala.util.Try$.apply(Try.scala:192) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$.ch$swisscard$bigdataanalyticspoc$app$stage1$Stage1$$parseFile(Stage1.scala:49) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$$anonfun$run$1$$anonfun$apply$1.apply(Stage1.scala:35) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$$anonfun$run$1$$anonfun$apply$1.apply(Stage1.scala:31) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$$anonfun$run$1.apply(Stage1.scala:30) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$$anonfun$run$1.apply(Stage1.scala:22) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428) at ch.swisscard.bigdataanalyticspoc.app.stage1.Stage1$.run(Stage1.scala:22) at ch.swisscard.bigdataanalyticspoc.app.Main$.delayedEndpoint$ch$swisscard$bigdataanalyticspoc$app$Main$1(Main.scala:15) at ch.swisscard.bigdataanalyticspoc.app.Main$delayedInit$body.apply(Main.scala:11) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at ch.swisscard.bigdataanalyticspoc.app.Main$.main(Main.scala:11) at ch.swisscard.bigdataanalyticspoc.app.Main.main(Main.scala)

We did some digging and found out that the path is converted into a URI once to often. Cobrix seems to convert the path in FileUtils.getFiles():84. Its converted to a URI and then converted to a RawPath which retains the character escaping of the URI (‘with space’ becomes ‘with%20space’). This path is then used by Hadoop which will again convert the Path into a URI (‘with%20space’ becomes ‘with%2520space’). This is then reverted back with getPath() (‘with%2520space’ becomes ‘with%20space’ again) and is then used as the path of a java.io.File object. See RawLocalFileSystem.pathToFile():86.

We would appreciate your help in resolving this.

Thanks, Patrick

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
patrick-winter-swisscardcommented, Nov 15, 2019

I can now confirm, that it works for us as well.

Thanks again, Patrick

0reactions
yruslancommented, Nov 15, 2019

The snapshot repository is not searched by default, you can enable it temporarily by adding this to your Maven profile:

<profiles>
  <profile>
     <id>allow-snapshots</id>
        <activation><activeByDefault>true</activeByDefault></activation>
     <repositories>
       <repository>
         <id>snapshots-repo</id>
         <url>https://oss.sonatype.org/content/repositories/snapshots</url>
         <releases><enabled>false</enabled></releases>
         <snapshots><enabled>true</enabled></snapshots>
       </repository>
     </repositories>
   </profile>
</profiles>

The fixed worked well on our cluster so we will release 1.1.1 soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Special Characters You Should Avoid Using and File Path ...
If you receive a message, while downloading, stating that the destination path is too long or that the file cannot be download to...
Read more >
Characters to Avoid in Filenames and Directories
Keep your filenames to a reasonable length and be sure they are under 31 characters. Most operating systems are case sensitive; always use...
Read more >
What Special Characters do I need to avoid to successfully ...
Avoiding common illegal filename characters is essential to ensure successful archive. Naming conventions for all files in an archive are important, not only ......
Read more >
VS unable to open project with some special characters in the ...
Based on your explanation, it sounds like I am missing some information since there isn't a folder with any special characters (only the...
Read more >
Hack to to include special characters in file path in haven
There seems to be an issue with the haven (1.1.1) package when including any type of special character in the file path, including...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found