Incorrect filtering by enqueuedTime
See original GitHub issueBug Report:
Event Hub’s filtering by startingPosition and endingPosition seem to not work correctly and return events with EnqueuedTime that are out of bounds.
Actual behavior
By running the following code sample (credentials redacted)
val spark = SparkSession.builder().appName("AbrisPoc").master("local[2]").config("spark.sql.session.timeZone", "UTC").getOrCreate()
val df =
spark.read.format("eventhubs").option(
"eventhubs.connectionString", "Endpoint=amqps://XXXXXXXXXX.servicebus.windows.net/XXXXXXXXXX;EntityPath=XXXXXXXXXXXXXXXXXXX;etc."
).option(
"eventhubs.startingPosition", """{"offset": null, "seqNo": -1, "enqueuedTime": "2019-04-26T00:00:00.000000Z", "isInclusive": true}"""
).option(
"eventhubs.endingPosition", """{"offset": null, "seqNo": -1, "enqueuedTime": "2019-04-26T23:59:59.999999Z", "isInclusive": true}"""
).load()
df.selectExpr("""MAX(EnqueuedTime)""").show(10, truncate = false)
We get the following result
+-----------------------+
|max(EnqueuedTime) |
+-----------------------+
|2019-04-27 00:13:00.889|
+-----------------------+
which is outside the specified interval of 2019-04-26T00:00:00.000000Z
to 2019-04-26T23:59:59.999999Z
Expected behavior
All the returned events should have an EnqueuedTime between the specified bounds.
Spark version :
2.3.3
spark-eventhubs artifactId and version:
from my sbt project:
libraryDependencies += "com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.3.9"
libraryDependencies += "com.microsoft.azure" % "azure-eventhubs" % "2.2.0"
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (6 by maintainers)
Top Results From Across the Web
Getting more control over starting point of Spark Stream ...
Getting more control over starting point of Spark Stream consumption with Offset and EnqueuedTime filters on EventHubs messages ...
Read more >azure-event-hubs-spark/Lobby - Gitter
Hi, I'm using the spark library in pyspark to write data from a Dataframe from Azure Databricks into Azure Event Hubs. I have...
Read more >NET exceptions - Azure Event Hubs | Microsoft Learn
This article provides a list of Azure Event Hubs .NET messaging exceptions and suggested actions.
Read more >How to set x-opt-offset when establishing connection to event ...
Apche QPID does not support AMQP filters ( the underlying Apache Proton ... startsWith("x-opt-enqueued-time")) { // support Azure Event HUB ...
Read more >API v1 - Tasks | Matillion ETL Docs
PATH/<filter> is the part of the PATH/task , which is further combined with ... enqueuedTime, The time this job was entered into the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @nyaghma Thank you a lot for this reply and confirmation. If possible, I would suggest you could add the comment here as well https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/structured-streaming-eventhubs-integration.md#eventhubsconf in the end position, so it may help others who had the similar question like me.
@nyaghma , Thanks a lot for looking into it. Yes, It is working fine, If i use only 3 digits millisecond precision.
However in this documentation (https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/PySpark/structured-streaming-pyspark.md) below code is used in the filter which gives 6 digits. Appreciate, if you could update this to avoid any further confusion.
endTime = dt.now().strftime(“%Y-%m-%dT%H:%M:%S.%fZ”)