Changefeed - scale out support
See original GitHub issueIs your feature request related to a problem? Please describe.
Long running ChangeFeedIterator
has no support for scale-out scenario, e.g. when data and nr. of partitions is growing, it’s not possible to split the FeedIterator
when partition was split.
In our scenario, we create new container, iterate FeedRange
s, and for each FeedRange
, we create FeedIterator
. So at the beginning we end up with very few FeedIterator
s.
As the data grow to thousands of partitions, we still end up with the initial number of FeedIterator
s and no way to scale-out.
And the processing is a bit CPU/IO heavy to process everything on one server.
Describe the solution you’d like
Best case scenario would be notified, when the partition split happens, which FeedRange
was split and what are the new FeedRange
s.
Or to be able to check if FeedRange
represent multiple partitions with possibility to split it into multiple FeedRange
s
Describe alternatives you’ve considered Changefeed processor. Not good for our use case. We need to be more in control how things are processed. e.g. more realtime processing, some ordering/dependency logic between partitions etc. And would require duplicating some of our infrastructure.
Additional context
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
I didn’t mean, that it should be without any interruption. Of course, the FeedRange with the split will be interrupted to create new iterators. But other FeedRanges without a split should not be affected.
In the case we expose a Scale API, it would require you to stop the iterator to handle and process the split. If you are running 1 iterator and want to check if after X iterations, it can be Scaled, you’d need to pass the current FeedRange / continuation, and hypothetically, the response would be the FeedRanges/continuations you can use after the split. So you would need to use maybe 1 of the results in the current machine to create a new iterator and rely the other results to other new instances.
I don’t see a way to make the process without interruptions.