Error writing dataframe on Databricks (java.lang.NoSuchMethodError)
See original GitHub issueI can’t seem to get even the simplest test to succeed in writing a dataframe to Cosmos Db, whether it’s my own code or the sample notebook.
All attempts consistently fail with the following error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost, executor driver): java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;
My config:
- Databricks runtime: 4.1 (includes Apache Spark 2.3.0, Scala 2.11)
- Library: azure-cosmosdb-spark_2.3.0_2.11-1.2.0
Here’s the complete stack trace:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<command-657143037778272> in <module>()
10
11 # Write to Cosmos DB
---> 12 df.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
703 self.format(format)
704 if path is None:
--> 705 self._jwrite.save()
706 else:
707 self._jwrite.save(path)
/databricks/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in __call__(self, *args)
1158 answer = self.gateway_client.send_command(command)
1159 return_value = get_return_value(
-> 1160 answer, self.gateway_client, self.target_id, self.name)
1161
1162 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/databricks/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
318 raise Py4JJavaError(
319 "An error occurred while calling {0}{1}{2}.\n".
--> 320 format(target_id, ".", name), value)
321 else:
322 raise Py4JError(
Py4JJavaError: An error occurred while calling o199.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost, executor driver): java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;
at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportAsyncImpl(DocumentBulkExecutor.java:635)
at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportInternal(DocumentBulkExecutor.java:496)
at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.importAll(DocumentBulkExecutor.java:452)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.bulkImport(CosmosDBSpark.scala:303)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.savePartition(CosmosDBSpark.scala:454)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.com$microsoft$azure$cosmosdb$spark$CosmosDBSpark$$savePartition(CosmosDBSpark.scala:399)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:190)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:181)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:42)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:336)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:300)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:111)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1738)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1726)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1725)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1725)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:963)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:963)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:963)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1961)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1909)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1897)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:760)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2147)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2166)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2191)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:951)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:375)
at org.apache.spark.rdd.RDD.collect(RDD.scala:950)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.save(CosmosDBSpark.scala:191)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.save(CosmosDBSpark.scala:487)
at com.microsoft.azure.cosmosdb.spark.DefaultSource.createRelation(DefaultSource.scala:77)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:155)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:143)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:195)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:192)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:143)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:106)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:106)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:91)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:128)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:287)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:281)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:226)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;
at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportAsyncImpl(DocumentBulkExecutor.java:635)
at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportInternal(DocumentBulkExecutor.java:496)
at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.importAll(DocumentBulkExecutor.java:452)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.bulkImport(CosmosDBSpark.scala:303)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.savePartition(CosmosDBSpark.scala:454)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.com$microsoft$azure$cosmosdb$spark$CosmosDBSpark$$savePartition(CosmosDBSpark.scala:399)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:190)
at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:181)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:42)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:336)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:300)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:111)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Error (java.lang.NoSuchMethodError) when sending a Spark ...
Error (java.lang.NoSuchMethodError) when sending a Spark data frame to Azure Eventhubs from a Databricks notebook.
Read more >org.apache.spark.sql.AnalysisException" while writing data to ...
Getting Error "java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException" while writing data to event hub for streaming.
Read more >java.lang.NoSuchMethodError in databricks
During run time, it throws me an error java.lang.NoSuchMethodError. The method exists in the package but still, I am getting this error.
Read more >java.lang.NoClassDefFoundError: org/apache/spark/internal ...
I have to write the extracted data from XML to DB , i am using Dataframe for transformation and trying to load that...
Read more >An error occurred while calling o5082.csv. : org.apache.spark ...
SparkException: Job aborted. when writing to csv ... Hi @laus (Customer), Looks like you want to save your dataframe as csv. ... java.lang....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@tknandu Any update on this? It’s quite a serious issue
This has been fixed with version azure-cosmosdb-spark_2.3.0_2.11-1.3.1. You can resolve the Maven coordinates in Databricks now instead of using the manually uploaded Uber jar. Just make sure to also add one of the slf4j bindings.