question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error writing dataframe on Databricks (java.lang.NoSuchMethodError)

See original GitHub issue

I can’t seem to get even the simplest test to succeed in writing a dataframe to Cosmos Db, whether it’s my own code or the sample notebook.

All attempts consistently fail with the following error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost, executor driver): java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;

My config:

  • Databricks runtime: 4.1 (includes Apache Spark 2.3.0, Scala 2.11)
  • Library: azure-cosmosdb-spark_2.3.0_2.11-1.2.0

Here’s the complete stack trace:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-657143037778272> in <module>()
     10 
     11 # Write to Cosmos DB
---> 12 df.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()

/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
    703             self.format(format)
    704         if path is None:
--> 705             self._jwrite.save()
    706         else:
    707             self._jwrite.save(path)

/databricks/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1158         answer = self.gateway_client.send_command(command)
   1159         return_value = get_return_value(
-> 1160             answer, self.gateway_client, self.target_id, self.name)
   1161 
   1162         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/databricks/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    318                 raise Py4JJavaError(
    319                     "An error occurred while calling {0}{1}{2}.\n".
--> 320                     format(target_id, ".", name), value)
    321             else:
    322                 raise Py4JError(

Py4JJavaError: An error occurred while calling o199.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost, executor driver): java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportAsyncImpl(DocumentBulkExecutor.java:635)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportInternal(DocumentBulkExecutor.java:496)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.importAll(DocumentBulkExecutor.java:452)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.bulkImport(CosmosDBSpark.scala:303)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.savePartition(CosmosDBSpark.scala:454)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.com$microsoft$azure$cosmosdb$spark$CosmosDBSpark$$savePartition(CosmosDBSpark.scala:399)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:190)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:181)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:42)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:336)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:300)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:111)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1738)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1726)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1725)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1725)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:963)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:963)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:963)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1961)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1909)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1897)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:760)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2147)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2166)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2191)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:951)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:375)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:950)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.save(CosmosDBSpark.scala:191)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.save(CosmosDBSpark.scala:487)
	at com.microsoft.azure.cosmosdb.spark.DefaultSource.createRelation(DefaultSource.scala:77)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:88)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:155)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:143)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:195)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:192)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:143)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:106)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:91)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:128)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:287)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:281)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:226)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportAsyncImpl(DocumentBulkExecutor.java:635)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportInternal(DocumentBulkExecutor.java:496)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.importAll(DocumentBulkExecutor.java:452)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.bulkImport(CosmosDBSpark.scala:303)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.savePartition(CosmosDBSpark.scala:454)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.com$microsoft$azure$cosmosdb$spark$CosmosDBSpark$$savePartition(CosmosDBSpark.scala:399)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:190)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:181)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:42)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:336)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:300)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:111)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
gugu91commented, Sep 28, 2018

@tknandu Any update on this? It’s quite a serious issue

0reactions
FabianMeiswinkelcommented, Dec 19, 2018

This has been fixed with version azure-cosmosdb-spark_2.3.0_2.11-1.3.1. You can resolve the Maven coordinates in Databricks now instead of using the manually uploaded Uber jar. Just make sure to also add one of the slf4j bindings.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error (java.lang.NoSuchMethodError) when sending a Spark ...
Error (java.lang.NoSuchMethodError) when sending a Spark data frame to Azure Eventhubs from a Databricks notebook.
Read more >
org.apache.spark.sql.AnalysisException" while writing data to ...
Getting Error "java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException" while writing data to event hub for streaming.
Read more >
java.lang.NoSuchMethodError in databricks
During run time, it throws me an error java.lang.NoSuchMethodError. The method exists in the package but still, I am getting this error.
Read more >
java.lang.NoClassDefFoundError: org/apache/spark/internal ...
I have to write the extracted data from XML to DB , i am using Dataframe for transformation and trying to load that...
Read more >
An error occurred while calling o5082.csv. : org.apache.spark ...
SparkException: Job aborted. when writing to csv ... Hi @laus (Customer)​, Looks like you want to save your dataframe as csv. ... java.lang....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found