Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Design Discussion] Should the Event Processor Client Support Creating Checkpoints for Arbitrary Events?

See original GitHub issue

Summary

The current design of the EventProcessorClient invokes an event handler to process data read from the Event Hubs service. Part of the arguments associated with that event handler is a method for creating/updating a checkpoint based on the EventData instance associated with the event arguments. This method takes no parameters and relies on the implicit context when called. It us not able to be used to manipulate checkpoint data for events other than the one associated with the arguments.

At present, no other means of creating/updating a checkpoint are surfaced as part of the EventProcessorClient API. This provides some difficulty for scenarios in which applications would prefer to create a checkpoint “after every XX number of events or YY amount of time has passed”, which is not an uncommon use case.

Scope of Discussion

Should the EventProcessorClient support a method to create/update checkpoints based on an arbitrary event?
Would such a method be more usable if it accepted the required data as individual arguments, an or an EventProcessorCheckpoint and offset value, or an EventProcessorCheckpoint and EventData instance?
Is there another potential design for enabling the scenario that should be considered?

Out of Scope

General changes to the EventProcessorClient unrelated to the theme of creating/updating checkpoints.

Concept Illustration

public class EventProcessorClient
{
    // This form assumes the context for the Event Hub and consumer group are sourced from the
    // EventProcessorClient and not provided individually.

    public async Task<EventHubCheckpoint> UpdateCheckpointAsync(
          string partitionId,
          long offset,
          CancellationToken cancellationToken = default);
}

var storageClient = new BlobContainerClient(<< ARGS >>);
var processor = new EventProcessorClient(storageClient, "<< CONSUMER GROUP >>", "<< CONNECTION STRING >>");

// Create a checkpoint for partition "0" using offset 12345
await processor.UpdateCheckpoint("0", 12345);

Considerations

The current model requires that the arguments passed to the event handler be cached for the event/partition combination that a checkpoint would be desired for. This is often something that is set on each invocation of the event handler. When the threshold for checkpointing is reached the cached arguments are referenced and the method is called.
The proposed concept would not remove the burden of having to track and cache information; it would, however, reduce the set of information being tracked to just the partition identifier and the offset of the desired event.

Issue Analytics

State:
Created 3 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

jsquirecommented, Apr 24, 2020

In your proposed solution above. How would I get the offsets? Would the burden be on the developer to track this? Why not store the array of partitions and the current offsets of each within the processor so that I can make a call based on latest offset for each partition.

That’s an interesting thought. That would also give us a query point to answer the question “what partitions are owned by the processor?” I wonder if we would want to track the event or just the offset. Something like:

public class EventProcessorClient
{
    public EventProcessorClientPartition OwnedPartitions  { get; }
}


// As suggested
public class EventProcesorClientPartition
{
    public string PartitionId { get; }
    public long LastProcessedEventOffset { get; }
}

// Alternative thought
public class EventProcesorClientPartition
{
    public string PartitionId { get; }
    public EventData LastProcessedEvent { get; }
}

0reactions

minascasioucommented, Jul 25, 2022

I’m sorely missing a timer-based setting right now. Something like … var clientOptions = new EventProcessorClientOptions { CheckpointFrequency= TimeSpan.FromMinutes(5) };

Including this simple property would save the complexity and risk of building a EventProcessor<TPartition> based solution.

Many business use cases would have quiet periods/handlers falling asleep and overnight/intra-day processing cut-offs. IMO In such cases the additional peace of mind and reduced impact on RPO and load on idempotent processing would be beneficial to the community.

Top Results From Across the Web

Receive events using Event Processor Host - Azure

Marking a checkpoint in EventProcessorHost is accomplished by calling the CheckpointAsync method on the PartitionContext object. This operation ...

Sample04_ProcessingEvents.md

Checkpointing is a process by which a processor records its position in the event stream for an Event Hub partition, marking which events...

Azure Event Hubs Event Processor client library for .NET

As an event processor reads and acts on events in the partition, it should periodically create checkpoints to both mark the events as...

understanding check pointing in eventhub

THE ANSWER EventProcessor framework is meant to achieve exactly what you are looking for. Checkpoints are not persisted via Server (aka ...

azure-eventhub

Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them to multiple consumers....