question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Writing into EvenHubs, Spark executors fail silently when getting throttled

See original GitHub issue

I have a job running in databricks which writes the result of a batch query (which returns 7 millions rows) into an eventhub with 1 TU. This job completes successfully but when I inspect the eventhub I see only around 2million events getting published.

On further inspection of the executor logs I see com.microsoft.azure.eventhubs.ServerBusyException being thrown but this error does not abort the spark job. Therefore its possible that the write operation returns without any exception but still fail to publish all the events.

Sample code:

val ehWriteConf = EventHubsConf(peersEhConnStr)
val peersLAFrame = peersFrame.select($"body", $"properties")
peersLAFrame.write
               .format("eventhubs")
               .options(ehWriteConf.toMap)
               .save()

Bug Report:

  • Actual behavior Throttled eventhub write/ingress operations fail silently.
  • Expected behavior When the eventhub operation is throttled that error should possibly abort the job.
  • Spark version Databricks 5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)
  • spark-eventhubs artifactId and version azure-eventhubs-spark_2.11:2.3.13

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rvoakcommented, Jul 13, 2020

@nyaghma Yes. Just tested it; throws an exception with 2.3.16!

0reactions
nyaghmacommented, Jul 9, 2020

@rvoak Thanks for letting us know. Can you please try again with the version 2.3.16? You should be able to see the exception in this version. Please let me know if the issue still happens with version 2.3.16.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Streaming Databricks job failing while writing to ADLS after ...
Streaming Databricks job failing while writing to ADLS after reading from Eventhub ; apache.spark.sql.execution.datasources.FileFormatWriter$.
Read more >
Writing large DataFrame from PySpark to Kafka runs into timeout
Only issue is that randomly it still runs in timeouts and apparently starts from the beginning again so that I'm ending up with...
Read more >
azure-event-hubs-spark/Lobby - Gitter
I have what is hopefully a quick question. Can I have multiple spark executors assigned to each event hub partition? or is is...
Read more >
java.util.concurrent.Executors Scala Example
This page shows Scala examples of java.util.concurrent.Executors.
Read more >
Main - Apache Camel
Name Description Default camel.main.autoConfiguration​Enabled true camel.main.autoStartup true camel.main.basePackageScan​Enabled Whether base package scan is enabled. true
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found