Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong seqNo is set when reading from eventhubs

See original GitHub issue

Bug Report:

Actual behavior Same issue as https://github.com/Azure/azure-event-hubs-spark/issues/462. The consequence is incredibly severe. I am unable to restart the same Spark Structured Stream without creating a new checkpoint.

The following is the stacktrace:

Job aborted due to stage failure: Task 30 in stage 2348.0 failed 4 times, most recent failure: Lost task 30.3 in stage 2348.0 (TID 4058, 10.139.64.6, executor 0): java.lang.IllegalStateException: In partition 30 of http-access-log, with consumer group $Default, request seqNo 19609525 is less than the received seqNo 19684911. The earliest seqNo is 19684804 and the last seqNo is 20231767
	at org.apache.spark.eventhubs.client.CachedEventHubsReceiver.checkCursor(CachedEventHubsReceiver.scala:189)
	at org.apache.spark.eventhubs.client.CachedEventHubsReceiver.org$apache$spark$eventhubs$client$CachedEventHubsReceiver$$receive(CachedEventHubsReceiver.scala:213)
	at org.apache.spark.eventhubs.client.CachedEventHubsReceiver$.receive(CachedEventHubsReceiver.scala:288)
	at org.apache.spark.eventhubs.rdd.EventHubsRDD.compute(EventHubsRDD.scala:120)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:317)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
	at org.apache.spark.scheduler.Task.run(Task.scala:113)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Expected behavior Offset is set correctly
Spark version 2.4.5
spark-eventhubs artifactId and version com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.14.1

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

sjkwakcommented, Apr 3, 2020

Hi @k4jiang it is a known issue and we’re working on a fix. We are going to release a new version with the fix in several days.

0reactions

nyaghmacommented, Jun 8, 2020

The version 2.3.15 includes the fix.

Top Results From Across the Web

Wrong seqNo is set when reading from evenehubs #462

I am using azure-event-hubs-spark connector to read data from eventhubs and write to one elasticsearch cluster and I got the following errors.

azure-event-hubs-spark/Lobby - Gitter

I'm trying to read static csv data files and sent each row as message to EventHub. Basically, using spark.read.format("csv") to create the dataframe...

Azure Event Hubs Checkpoint Store client library for Java

The offset/sequence number enables an event consumer (reader) to specify a point in the event stream from which they want to begin reading ......

azure-eventhub-checkpointstoreblob-aio - PyPI

Microsoft Azure Event Hubs checkpointer implementation with Blob Storage ... The offset/sequence number enables an event consumer (reader) to specify a ...

Unable to read Azure Eventhub topics from spark

You need to add EventHubs package when creating session: park = SparkSession.builder.appName('ntorq_eventhub_load')\ ...