question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect filtering by enqueuedTime

See original GitHub issue

Bug Report:

Event Hub’s filtering by startingPosition and endingPosition seem to not work correctly and return events with EnqueuedTime that are out of bounds.

Actual behavior

By running the following code sample (credentials redacted)

val spark = SparkSession.builder().appName("AbrisPoc").master("local[2]").config("spark.sql.session.timeZone", "UTC").getOrCreate()

  val df =
    spark.read.format("eventhubs").option(
      "eventhubs.connectionString", "Endpoint=amqps://XXXXXXXXXX.servicebus.windows.net/XXXXXXXXXX;EntityPath=XXXXXXXXXXXXXXXXXXX;etc."
    ).option(
      "eventhubs.startingPosition", """{"offset": null, "seqNo": -1, "enqueuedTime": "2019-04-26T00:00:00.000000Z", "isInclusive": true}"""
    ).option(
      "eventhubs.endingPosition", """{"offset": null, "seqNo": -1, "enqueuedTime": "2019-04-26T23:59:59.999999Z", "isInclusive": true}"""
    ).load()
  df.selectExpr("""MAX(EnqueuedTime)""").show(10, truncate = false)

We get the following result

  +-----------------------+
  |max(EnqueuedTime)      |
  +-----------------------+
  |2019-04-27 00:13:00.889|
  +-----------------------+

which is outside the specified interval of 2019-04-26T00:00:00.000000Z to 2019-04-26T23:59:59.999999Z

Expected behavior

All the returned events should have an EnqueuedTime between the specified bounds.

Spark version :

2.3.3

spark-eventhubs artifactId and version:

from my sbt project:

libraryDependencies += "com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.3.9"
libraryDependencies += "com.microsoft.azure" % "azure-eventhubs" % "2.2.0"

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
guochcommented, Jun 22, 2020

endingPosition / endingPositions determine the ending position of a batch query. Sine in a streaming query there is no end position, this value would be simply ignored even if it’s been set in the client code. If you want to read events to a certain point use batch query instead of streaming query.

Hi @nyaghma Thank you a lot for this reply and confirmation. If possible, I would suggest you could add the comment here as well https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/structured-streaming-eventhubs-integration.md#eventhubsconf in the end position, so it may help others who had the similar question like me.

1reaction
sandeep8530commented, Jun 1, 2020

@nyaghma , Thanks a lot for looking into it. Yes, It is working fine, If i use only 3 digits millisecond precision.

However in this documentation (https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/PySpark/structured-streaming-pyspark.md) below code is used in the filter which gives 6 digits. Appreciate, if you could update this to avoid any further confusion.

endTime = dt.now().strftime(“%Y-%m-%dT%H:%M:%S.%fZ”)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting more control over starting point of Spark Stream ...
Getting more control over starting point of Spark Stream consumption with Offset and EnqueuedTime filters on EventHubs messages ...
Read more >
azure-event-hubs-spark/Lobby - Gitter
Hi, I'm using the spark library in pyspark to write data from a Dataframe from Azure Databricks into Azure Event Hubs. I have...
Read more >
NET exceptions - Azure Event Hubs | Microsoft Learn
This article provides a list of Azure Event Hubs .NET messaging exceptions and suggested actions.
Read more >
How to set x-opt-offset when establishing connection to event ...
Apche QPID does not support AMQP filters ( the underlying Apache Proton ... startsWith("x-opt-enqueued-time")) { // support Azure Event HUB ...
Read more >
API v1 - Tasks | Matillion ETL Docs
PATH/<filter> is the part of the PATH/task , which is further combined with ... enqueuedTime, The time this job was entered into the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found