Geo DR Recovery Breaks Consumers On Restart
See original GitHub issueHi,
I was testing the Geo DR recovery story here.
I’ve found that after a failover has completed, and I restart the consumer, I constantly recieve errors that look like:
Error: The supplied offset ‘4984’ is invalid. The last offset in the system is ‘96’ TrackingId:cae58860-80ef-4c0b-8fd9-86658d4c31d9_B24`
I have tried setting InitialOffsetProvider = (_) => EventPosition.FromEnd()
but this seems to have no effect. The processor is still trying to go through old commit offsets (why even have the options in the first place if they’re ignored?)
I’m using the sample code provided here
Order of events
- Create primary Eventhub namespace and hub
- Create secondary Eventhub namespace in different region with no hub
- Go to Geo-Recovery in primary namespace and link secondary namespace with a new alias
- Start the Receiver
- Start the Sender
- Initiate failover in Azure portal
- note: there is no stopping of sender/receiver while failover is happening, there are no errors before, during, or after failover so long as processes aren’t stopped.
- After failover completes, stop sender and receiver
- Start receiver, results in error above
- Start sender, results in no errors
The only way I have gotten these errors to stop and get everything running as normal again is to delete the Storage Accoung blob container that contains the commits.
This is an important scanerio for my team and it seems as though if I setup geo-recovery it will break all of my consumers. Or I have to tell all my readers to delete their commit blobs in the event of a failover and their application restarts.
Is there any fix for this?
It seems as though if EventProcessorHost used InitialOffsetProvider
from EventProcessorOptions
instead of ignoring it this wouldn’t be an issue. As the reader would read new data from partitions without checking for the previous offsets (which don’t exist because they come from a different Hub).
Versions
- OS platform and version: Windows 10 1903
- .NET Version: Core 2.1
- NuGet package version or commit ID:
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (5 by maintainers)
Top GitHub Comments
Please note that this is not only for EPH, underlying receivers won’t also handle DR namespace switch and hence will need to be restarted. I will find out where we can put some appropriate information into public documents.
Currently there is no ETA but I can say this will be addressed before end of this year.
@keggster101020
Please let us know if running a monitor job to check failover state resolved the issue.
I’m closing this issue for now, but if your issue isn’t resolved please open a new issue and reference this.