Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Geo DR Recovery Breaks Consumers On Restart

See original GitHub issue

Hi,

I was testing the Geo DR recovery story here.

I’ve found that after a failover has completed, and I restart the consumer, I constantly recieve errors that look like:

Error: The supplied offset ‘4984’ is invalid. The last offset in the system is ‘96’ TrackingId:cae58860-80ef-4c0b-8fd9-86658d4c31d9_B24`

I have tried setting InitialOffsetProvider = (_) => EventPosition.FromEnd() but this seems to have no effect. The processor is still trying to go through old commit offsets (why even have the options in the first place if they’re ignored?)

I’m using the sample code provided here

Order of events

Create primary Eventhub namespace and hub
Create secondary Eventhub namespace in different region with no hub
Go to Geo-Recovery in primary namespace and link secondary namespace with a new alias
Start the Receiver
Start the Sender
Initiate failover in Azure portal
- note: there is no stopping of sender/receiver while failover is happening, there are no errors before, during, or after failover so long as processes aren’t stopped.
After failover completes, stop sender and receiver
Start receiver, results in error above
Start sender, results in no errors

The only way I have gotten these errors to stop and get everything running as normal again is to delete the Storage Accoung blob container that contains the commits.

This is an important scanerio for my team and it seems as though if I setup geo-recovery it will break all of my consumers. Or I have to tell all my readers to delete their commit blobs in the event of a failover and their application restarts.

Is there any fix for this? It seems as though if EventProcessorHost used InitialOffsetProvider from EventProcessorOptions instead of ignoring it this wouldn’t be an issue. As the reader would read new data from partitions without checking for the previous offsets (which don’t exist because they come from a different Hub).

Versions

OS platform and version: Windows 10 1903
.NET Version: Core 2.1
NuGet package version or commit ID:

Issue Analytics

State:
Created 4 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

serkantkaracacommented, Apr 1, 2019

Please note that this is not only for EPH, underlying receivers won’t also handle DR namespace switch and hence will need to be restarted. I will find out where we can put some appropriate information into public documents.

Currently there is no ETA but I can say this will be addressed before end of this year.

0reactions

axisccommented, Aug 21, 2019

@keggster101020

Please let us know if running a monitor job to check failover state resolved the issue.

I’m closing this issue for now, but if your issue isn’t resolved please open a new issue and reference this.

Top Results From Across the Web

Azure Event Hubs - Geo-disaster recovery

The Geo-Disaster recovery feature ensures that the entire configuration of a namespace (Event Hubs, Consumer Groups and settings) is ...

Disaster Recovery (Geo)

Re-enable migrations now that PostgreSQL is restarted and listening on the private address. Edit /etc/gitlab/gitlab.rb and change the configuration to true :.

Disaster recovery for planned failover

As replication between Geo sites is asynchronous, a planned failover requires a maintenance window in which updates to the primary site are blocked....

Architecting disaster recovery for cloud infrastructure outages

Step-by-step guide to designing disaster recovery for applications in Google ... Data will be stored in a single region within the geographic location....

10.11. Troubleshooting Geo-replication

After restarting geo-replication, it will begin a synchronization of the data using checksums. This may be a long and resource intensive process on...