question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EventHub integration offset value errors

See original GitHub issue

Repro steps

Provide the steps required to reproduce the problem

  1. Create an EventHub trigger integration.

  2. Pause or delete integration for a period that exceeds the retention of the EventHub

  3. Resume the integration / unpause the trigger.

Expected behavior

Upon resumption of the trigger, the stored offsets will be invalid. The EventHub trigger should compensate for this and be able to reset the offset.

In addition, upon the deletion of an input trigger, the corresponding blob data for offset checkpointing should be deleted from the storage account.

Actual behavior

Any partitions with invalid offsets will constantly produce errors from the AMQP consumer. The trigger never fixes these offsets and this error is not viewable from within the functions app logs, etc. It is only viewable (and thus Microsoft support cannot identify the problem either) with Application Insights.

e.g.

System.ArgumentException: The supplied offset '55838201792' is invalid. The last offset in the system is '30089580512' TrackingId:<redacted>_B14, SystemTracker:<redacted>:eventhub:<redacted>~12287, Timestamp:2019-01-18T04:56:57 Reference:<redacted>, TrackingId:<redacted>_B14, SystemTracker:<redacted>:eventhub:<redacted>~12287|$default, Timestamp:2019-01-18T04:56:57 TrackingId:<redacted>_G6, SystemTracker:gateway5, Timestamp:2019-01-18T04:56:57
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.PartitionReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.ReceivePumpAsync(CancellationToken cancellationToken, Boolean invokeWhenNoEvents)



System.OperationCanceledException: The AMQP object session36857 is aborted.
   at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result)
   at Microsoft.Azure.Amqp.AmqpObject.OpenAsyncResult.End(IAsyncResult result)
   at Microsoft.Azure.Amqp.AmqpObject.EndOpen(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.CreateLinkAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1.OnCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.CreateValue(TaskCompletionSource`1 tcs, TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.PartitionReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.ReceivePumpAsync(CancellationToken cancellationToken, Boolean invokeWhenNoEvents)

Known workarounds

We believe that deleting the blobs with the bad offsets will resolve the problem by causing the blob to be recreated.

Additional information

The Azure EventHub and Functions integration should to do two things:

  • Upon detecting an offset error, it needs to make a decision about what to do. That is to reset the offset checkpoint and probably (safest) to recapture from earliest data in that partition or to capture from the latest data. There might be value making this user-configurable.
  • When an EventHub trigger is deleted, the corresponding offset data should be deleted from the storage account.

Bonus: it would be nice if the user could see these errors in the logs of the functions app, but they do not appear there.

For the details:

The EventHub integration keeps offset data in a path in the storage account at: azure-webjobs-eventhub/<namespace>.servicebus.windows.net/<eventhub name>/<consumer group>/

In here, there is a file for each partition. The contents of the file are structure is as show below:

{"Offset":"<offset count>","SequenceNumber":<number>,"PartitionId":"0","Owner":"<uuid>","Token":"<uuid>","Epoch":<number>}

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:10
  • Comments:33 (2 by maintainers)

github_iconTop GitHub Comments

8reactions
mbrancatocommented, Jan 23, 2019

Noise? No. This prevents messages from being ingested and processed by the function from any of the affected partitions until the offsets are fixed.

4reactions
mbrancatocommented, Mar 4, 2020

Hi @jeffhollan - my original problem was not due to deleting the EventHub. It was because the EventHub consumer was paused longer than the retention period. I just want to make clear that deleting storage, etc were attempts to fix the problem, not the cause. That said messing, with storage etc can land in the same state.

I think EventHub just needs to detect when the offset is invalid and cleanup the storage.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Event Hub Source Connector throwing "Invalid Offset" error ...
1 Answer. It appears that the error encountered is related to the usage of an old offset from the previous Event Hub while...
Read more >
Microsoft Azure Event Hub Pulls - Wrong Offset Error
I am getting the following error from Azure Event Hub. 2019-12-06 14:57:58201 ERROR ... So where is the MSCS app storing the offset...
Read more >
Features and terminology in Azure Event Hubs
This offset enables an event consumer (reader) to specify a point in the event stream from which they want to begin reading events....
Read more >
Resilient design guidance for Event Hubs and Functions
Learn how to develop resilient and scalable code that runs on Azure Functions and responds to Event Hubs events.
Read more >
Azure Event Hub Consumer Group with its Scenarios
The events are processed based on the initial offset option. processed events. Note: The offset will not be assigned as there are no...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found