Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Files not writing to remote HDFS

See original GitHub issue

Spark-Bench version (version number, tag, or git commit hash)

dc78dad

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

Spark 2.1.1 Stanalone mode

Scala version on your cluster

2.11.6

Your exact configuration file (with system details anonymized for security)

From time to time I commented generation OR benchmarking to test them separately.

spark-bench = {
  spark-submit-parallel = false
  spark-submit-config = [{
    suites-parallel = false
    workload-suites = [
      {
        descr = "Generating data for the benchmarks to use"
        parallel = false
        repeat = 1 // generate once and done!
        benchmark-output = "console"
        workloads = [
          {
            name = "data-generation-kmeans"
            output = "hdfs://hostname:9000/tmp/spark-bench-test/kmeans-data.parquet"
            rows = 10000
            cols = 14
          }
        ]
     },
      {
        descr = "Classic benchmarking"
        parallel = false
        repeat = 1 
        benchmark-output = "console"
        workloads = [
          {
            name = "kmeans"
            input = "hdfs://hostname:9000/tmp/spark-bench-test/kmeans-data.parquet"
          }
      ]}
    ]
  }]
}

Relevant stacktrace

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://hostname:9000/tmp/spark-bench-test/kmeans-data.parquet, expected: file:///
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
	at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
	at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.pathExists(SparkFuncs.scala:87)
	at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyCanWrite(SparkFuncs.scala:41)
	at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyCanWriteOrThrow(SparkFuncs.scala:102)
	at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyOutput(SparkFuncs.scala:34)
	at com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:49)
	at com.ibm.sparktc.sparkbench.datageneration.mlgenerator.KMeansDataGen.run(KMeansDataGen.scala:44)

Description of your problem and any other relevant info

Actual error occurs at: com.ibm.sparktc.sparkbench.utils.SparkFuncs$.pathExists(SparkFuncs.scala:87)

For quick test I changed com.ibm.sparktc.sparkbench.utils.pathExists as follows:
def pathExists(path: String, spark: SparkSession): Boolean = { false } and generation run successfully. After that I change it to return true, commented generation and “Classic benchmarking” also run successfully. So I believe either getHadoopFS(path, spark).exists(new org.apache.hadoop.fs.Path(path)) doesn’t works as expected both for files and folder or com.ibm.sparktc.sparkbench.utils.SparkFuncs$.pathExists(SparkFuncs.scala:87) is used somewhere in an erroneous way.

Issue Analytics

State:
Created 6 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

ecurtincommented, Feb 15, 2018

Closed by #155. Distribution will be created and uploaded to Github Releases in ~15 minutes or checkout master. Thank you @AndriiSushko for your thorough and clear bug report!

0reactions

ecurtincommented, Feb 16, 2018

@AndriiSushko I would be super happy to have you as a contributor! Send me an email if you wanna chat about design or Scala or anything. 😃