Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error writing dataframe on Databricks (java.lang.NoSuchMethodError)

See original GitHub issue

I can’t seem to get even the simplest test to succeed in writing a dataframe to Cosmos Db, whether it’s my own code or the sample notebook.

All attempts consistently fail with the following error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost, executor driver): java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;

My config:

Databricks runtime: 4.1 (includes Apache Spark 2.3.0, Scala 2.11)
Library: azure-cosmosdb-spark_2.3.0_2.11-1.2.0

Here’s the complete stack trace:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-657143037778272> in <module>()
     10 
     11 # Write to Cosmos DB
---> 12 df.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()

/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
    703             self.format(format)
    704         if path is None:
--> 705             self._jwrite.save()
    706         else:
    707             self._jwrite.save(path)

/databricks/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1158         answer = self.gateway_client.send_command(command)
   1159         return_value = get_return_value(
-> 1160             answer, self.gateway_client, self.target_id, self.name)
   1161 
   1162         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/databricks/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    318                 raise Py4JJavaError(
    319                     "An error occurred while calling {0}{1}{2}.\n".
--> 320                     format(target_id, ".", name), value)
    321             else:
    322                 raise Py4JError(

Py4JJavaError: An error occurred while calling o199.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost, executor driver): java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportAsyncImpl(DocumentBulkExecutor.java:635)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportInternal(DocumentBulkExecutor.java:496)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.importAll(DocumentBulkExecutor.java:452)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.bulkImport(CosmosDBSpark.scala:303)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.savePartition(CosmosDBSpark.scala:454)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.com$microsoft$azure$cosmosdb$spark$CosmosDBSpark$$savePartition(CosmosDBSpark.scala:399)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:190)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:181)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:42)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:336)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:300)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:111)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1738)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1726)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1725)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1725)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:963)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:963)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:963)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1961)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1909)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1897)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:760)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2147)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2166)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2191)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:951)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:375)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:950)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.save(CosmosDBSpark.scala:191)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.save(CosmosDBSpark.scala:487)
	at com.microsoft.azure.cosmosdb.spark.DefaultSource.createRelation(DefaultSource.scala:77)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:88)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:155)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:143)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:195)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:192)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:143)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:106)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:91)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:128)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:287)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:281)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:226)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsed()Ljava/time/Duration;
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportAsyncImpl(DocumentBulkExecutor.java:635)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportInternal(DocumentBulkExecutor.java:496)
	at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.importAll(DocumentBulkExecutor.java:452)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.bulkImport(CosmosDBSpark.scala:303)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.savePartition(CosmosDBSpark.scala:454)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.com$microsoft$azure$cosmosdb$spark$CosmosDBSpark$$savePartition(CosmosDBSpark.scala:399)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:190)
	at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:181)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:859)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:42)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:336)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:300)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:111)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more

Issue Analytics

State:
Created 5 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

3reactions

gugu91commented, Sep 28, 2018

@tknandu Any update on this? It’s quite a serious issue

0reactions

FabianMeiswinkelcommented, Dec 19, 2018

This has been fixed with version azure-cosmosdb-spark_2.3.0_2.11-1.3.1. You can resolve the Maven coordinates in Databricks now instead of using the manually uploaded Uber jar. Just make sure to also add one of the slf4j bindings.