[Design Discussion] Should the Event Processor Client Support Creating Checkpoints for Arbitrary Events?
See original GitHub issueSummary
The current design of the EventProcessorClient
invokes an event handler to process data read from the Event Hubs service. Part of the arguments associated with that event handler is a method for creating/updating a checkpoint based on the EventData
instance associated with the event arguments. This method takes no parameters and relies on the implicit context when called. It us not able to be used to manipulate checkpoint data for events other than the one associated with the arguments.
At present, no other means of creating/updating a checkpoint are surfaced as part of the EventProcessorClient
API. This provides some difficulty for scenarios in which applications would prefer to create a checkpoint “after every XX number of events or YY amount of time has passed”, which is not an uncommon use case.
Scope of Discussion
-
Should the
EventProcessorClient
support a method to create/update checkpoints based on an arbitrary event? -
Would such a method be more usable if it accepted the required data as individual arguments, an or an
EventProcessorCheckpoint
and offset value, or anEventProcessorCheckpoint
andEventData
instance? -
Is there another potential design for enabling the scenario that should be considered?
Out of Scope
- General changes to the
EventProcessorClient
unrelated to the theme of creating/updating checkpoints.
Concept Illustration
public class EventProcessorClient
{
// This form assumes the context for the Event Hub and consumer group are sourced from the
// EventProcessorClient and not provided individually.
public async Task<EventHubCheckpoint> UpdateCheckpointAsync(
string partitionId,
long offset,
CancellationToken cancellationToken = default);
}
var storageClient = new BlobContainerClient(<< ARGS >>);
var processor = new EventProcessorClient(storageClient, "<< CONSUMER GROUP >>", "<< CONNECTION STRING >>");
// Create a checkpoint for partition "0" using offset 12345
await processor.UpdateCheckpoint("0", 12345);
Considerations
-
The current model requires that the arguments passed to the event handler be cached for the event/partition combination that a checkpoint would be desired for. This is often something that is set on each invocation of the event handler. When the threshold for checkpointing is reached the cached arguments are referenced and the method is called.
-
The proposed concept would not remove the burden of having to track and cache information; it would, however, reduce the set of information being tracked to just the partition identifier and the offset of the desired event.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
That’s an interesting thought. That would also give us a query point to answer the question “what partitions are owned by the processor?” I wonder if we would want to track the event or just the offset. Something like:
I’m sorely missing a timer-based setting right now. Something like … var clientOptions = new EventProcessorClientOptions { CheckpointFrequency= TimeSpan.FromMinutes(5) };
Including this simple property would save the complexity and risk of building a EventProcessor<TPartition> based solution.
Many business use cases would have quiet periods/handlers falling asleep and overnight/intra-day processing cut-offs. IMO In such cases the additional peace of mind and reduced impact on RPO and load on idempotent processing would be beneficial to the community.