Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser when reading a folder

See original GitHub issue

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Hello,

I am trying to use the last version of the library on a Databricks cluster to read a folder directly in excel format. Here the stacktrace :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 27) (10.131.243.108 executor 0): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V at com.crealytics.spark.v2.excel.ExcelParser$.parseIterator(ExcelParser.scala:423) at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.readFile(ExcelPartitionReaderFactory.scala:75) at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.buildReader(ExcelPartitionReaderFactory.scala:61) at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createReader$1(FilePartitionReaderFactory.scala:30) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:99) at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:43) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:94) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:131) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:95) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:826) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1670) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:829) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:684) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2984) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2931) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2925) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2925) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1345) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1345) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1345) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3193) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3134) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3122) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1107) at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2628) at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:266) at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:276) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:81) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:87) at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:75) at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:62) at org.apache.spark.sql.execution.ResultCacheManager.collectResult$1(ResultCacheManager.scala:587) at org.apache.spark.sql.execution.ResultCacheManager.computeResult(ResultCacheManager.scala:596) at org.apache.spark.sql.execution.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:542) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:541) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:438) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:417) at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:422) at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3132) at org.apache.spark.sql.Dataset.$anonfun$collectResult$1(Dataset.scala:3123) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3930) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:209) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:356) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:160) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:958) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:115) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:306) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3928) at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3122) at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:268) at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:102) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$getResultBufferInternal$3(ScalaDriverLocal.scala:345) at scala.Option.map(Option.scala:230) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$getResultBufferInternal$1(ScalaDriverLocal.scala:325) at scala.Option.map(Option.scala:230) at com.databricks.backend.daemon.driver.ScalaDriverLocal.getResultBufferInternal(ScalaDriverLocal.scala:289) at com.databricks.backend.daemon.driver.DriverLocal.getResultBuffer(DriverLocal.scala:715) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:267) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$11(DriverLocal.scala:602) at com.databricks.logging.Log4jUsageLoggingShim$.$anonfun$withAttributionContext$1(Log4jUsageLoggingShim.scala:28) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94) at com.databricks.logging.Log4jUsageLoggingShim$.withAttributionContext(Log4jUsageLoggingShim.scala:26) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:205) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:204) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:60) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:240) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:225) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:60) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:579) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:615) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:607) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:526) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:561) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:431) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:374) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:225) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V at com.crealytics.spark.v2.excel.ExcelParser$.parseIterator(ExcelParser.scala:423) at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.readFile(ExcelPartitionReaderFactory.scala:75) at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.buildReader(ExcelPartitionReaderFactory.scala:61) at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createReader$1(FilePartitionReaderFactory.scala:30) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:99) at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:43) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:94) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:131) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:95) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:826) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1670) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:829) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:684) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

No such method is usually linked with a compatibility issue but i have tried some different configuration and always the same problem.

I want to use the library in pyspark but i have tried in scala too and same issue.

It is ok when i am using “com.crealytics.spark.excel” just for one file.

Thanks a lot.

Expected Behavior

No response

Steps To Reproduce

%scala val df = spark.read .format(“excel”) .option(“header”, “true”) .option(“inferSchema”, “true”) .load(“s3://bucket/prefix/”)

display(df)

%python df = (spark .read .format(“excel”) .option(“header”, “true”) .option(“inferSchema”, “true”) .load(“s3://bucket/prefix/”) )

display(df)

Environment

- Spark version: 3.2.1
- Spark-Excel version: 3.2.1_0.17.0 scala 2.12
- OS: Databricks cluster
- Cluster environment : 10.4 LTS Photon

Anything else?

No response

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:16 (3 by maintainers)

Top GitHub Comments

2reactions

alexjbushcommented, May 3, 2022

This issue looks like the identical issue I get running on Azure Databricks 10.4 described here: https://github.com/crealytics/spark-excel/issues/467#issuecomment-1087075025

Out of interest, I got the FailureSafeParser constructor details in Databricks to compare against the Apache definition:

import scala.reflect.runtime.universe._

def getConstructorParams[T: TypeTag] =
    typeOf[T].decl(termNames.CONSTRUCTOR)
             .alternatives.head.asMethod
             .paramLists.head.map(p => s"${p.asTerm.name.toString}: ${p.asTerm.info.toString} # isParamWithDefault=${p.asTerm.isParamWithDefault}")

getConstructorParams[org.apache.spark.sql.catalyst.util.FailureSafeParser[Vector[String]]]

And gives the result:

import scala.reflect.runtime.universe._
getConstructorParams: [T](implicit evidence$1: reflect.runtime.universe.TypeTag[T])List[String]
res0: List[String] = List(rawParser: IN => Iterable[org.apache.spark.sql.catalyst.InternalRow] # isParamWithDefault=false, mode: org.apache.spark.sql.catalyst.util.ParseMode # isParamWithDefault=false, schema: org.apache.spark.sql.types.StructType # isParamWithDefault=false, columnNameOfCorruptRecord: String # isParamWithDefault=false, filePath: Option[String] # isParamWithDefault=false, debugWriter: Option[com.databricks.sql.catalyst.BadRecordsWriter] # isParamWithDefault=false, columnNameOfRescuedDataOpt: Option[String] # isParamWithDefault=true)

This gives a constructor definition of:

class FailureSafeParser[IN](
    rawParser: IN => Iterable[InternalRow],
    mode: ParseMode,
    schema: StructType,
    columnNameOfCorruptRecord: String,
    filePath: Option[String],
    debugWriter: Option[com.databricks.sql.catalyst.BadRecordsWriter],
    columnNameOfRescuedDataOpt: Option[String] = ?)

Compared to what is expected against Apache Spark: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala

class FailureSafeParser[IN](
    rawParser: IN => Iterable[InternalRow],
    mode: ParseMode,
    schema: StructType,
    columnNameOfCorruptRecord: String)

They seem to have added two non-optional arguments and one optional argument to the constructor of FailureSafeParser.

I guess there are a few options here, but I have no idea how realistic they are:

Use reflection to instantiate an instance of the class and fill the correct constructor (yuk)
Build against the Databricks Spark jars (difficult to source and automate)
Use an alternative approach to parsing (is this even possible?)

Any comment on how feasible these options are, and if there are any other alternatives?

1reaction

nightscapecommented, May 5, 2022

Hey @sevann71, great work and follow-up! I’m wondering if the Databricks guys could do something like the Scala compiler team and run the unit tests of “popular” (😁) Spark plugins on their platform. Or maybe use Mima as well.

Top Results From Across the Web

Spark 3.1.2 NoSuchMethodError: org.apache.spark.sql ...

I'm using spark-alchemy below version and calling a function named hll_init_agg inside .agg and getting above error. CODE where it's called:

crealytics - Bountysource

I try to read an excel file size 35MB and write the result as orc files ... [BUG] NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser ...

Apache Avro Data Source Guide - Spark 3.3.1 Documentation

To load/save data in Avro format, you need to specify the data source option format as avro (or org.apache.spark.sql.avro ). Scala; Java; Python;...

How to read excel file using databricks

I'm getting error java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution ...

Solved: User class threw exception: org.apache.spark.sql.A...

lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/. Labels:.