[QUERY] Zero downtime Event Hub processor migration from v4 to v5
See original GitHub issueLibrary name and version
Azure.Messaging.EventHubs 5.6.2
Query/Question
Is it possible to achieve zero-downtime migration of Event Hub consumers using v4 (Microsoft.Azure.EventHubs) to v5 (Azure.Messaging.EventHubs)? If not, do you have any tips on minimizing the downtime?
The breaking change in checkpoint format is a major problem in migration. Currently we have following plan:
- Use custom
EventProcessor<TPartition>
which is able to read legacy checkpoints (inOnInitializingPartitionAsync
). - The old consumers are running on side A.
- Deploy new code to side B. Events are not consumed here since v5 SDK uses epoch value
0
and consumers are endlessly restarting due to encountering “epoch exception” [1]. - Disconnect old consumers by stopping side A.
- New consumers start consuming events in side B.
There is a possibility of downtime between steps 4 and 5 (depending on how fast you can stop old consumers + how fast Azure Event Hub service detects the disconnects + how fast new consumers detect the disconnects). Are there any settings in SDK or Azure portal that would allow to minimize that?
The best option would be to pass higher epoch in v5 SDK (forcing old consumers to disconnect) but it is not possible - the 0
value is hardcoded in EventProcessor class.
[1]
Exception Message "Receiver 'e69d42ae-72a6-418e-be1f-4d388a390188' with a higher epoch '2' already exists. Receiver 'P1-b47177da-cf2b-46f0-8cd1-dfa007165fd9' with epoch 0 cannot be created. Make sure you are creating receiver with increasing epoch value to ensure connectivity, or ensure all old epoch receivers are closed or disconnected.
Environment
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Ok, I think all clear now:
LoadBalancingInterval * 2
downtime (deployment-related delays easily outweigh that) and resource-usage increase is temporary(during migration) so also not a problem.Thanks for your help. Feel free to close the issue.
Hi @rzepinskip. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “
/unresolve
” to remove the “issue-addressed” label and continue the conversation.