question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong seqNo is set when reading from eventhubs

See original GitHub issue

Bug Report:

The following is the stacktrace:

Job aborted due to stage failure: Task 30 in stage 2348.0 failed 4 times, most recent failure: Lost task 30.3 in stage 2348.0 (TID 4058, 10.139.64.6, executor 0): java.lang.IllegalStateException: In partition 30 of http-access-log, with consumer group $Default, request seqNo 19609525 is less than the received seqNo 19684911. The earliest seqNo is 19684804 and the last seqNo is 20231767
	at org.apache.spark.eventhubs.client.CachedEventHubsReceiver.checkCursor(CachedEventHubsReceiver.scala:189)
	at org.apache.spark.eventhubs.client.CachedEventHubsReceiver.org$apache$spark$eventhubs$client$CachedEventHubsReceiver$$receive(CachedEventHubsReceiver.scala:213)
	at org.apache.spark.eventhubs.client.CachedEventHubsReceiver$.receive(CachedEventHubsReceiver.scala:288)
	at org.apache.spark.eventhubs.rdd.EventHubsRDD.compute(EventHubsRDD.scala:120)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
	at org.apache.spark.scheduler.Task.run(Task.scala:113)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
  • Expected behavior Offset is set correctly
  • Spark version 2.4.5
  • spark-eventhubs artifactId and version com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.14.1

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
sjkwakcommented, Apr 3, 2020

Hi @k4jiang it is a known issue and we’re working on a fix. We are going to release a new version with the fix in several days.

0reactions
nyaghmacommented, Jun 8, 2020

The version 2.3.15 includes the fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wrong seqNo is set when reading from evenehubs #462
I am using azure-event-hubs-spark connector to read data from eventhubs and write to one elasticsearch cluster and I got the following errors.
Read more >
azure-event-hubs-spark/Lobby - Gitter
I'm trying to read static csv data files and sent each row as message to EventHub. Basically, using spark.read.format("csv") to create the dataframe...
Read more >
Azure Event Hubs Checkpoint Store client library for Java
The offset/sequence number enables an event consumer (reader) to specify a point in the event stream from which they want to begin reading ......
Read more >
azure-eventhub-checkpointstoreblob-aio - PyPI
Microsoft Azure Event Hubs checkpointer implementation with Blob Storage ... The offset/sequence number enables an event consumer (reader) to specify a ...
Read more >
Unable to read Azure Eventhub topics from spark
You need to add EventHubs package when creating session: park = SparkSession.builder.appName('ntorq_eventhub_load')\ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found