Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support a MinBatchSize property in web job event hub extensions

See original GitHub issue

Library or service name. Microsoft.Azure.WebJobs.Extensions.EventHubs

Is your feature request related to a problem? Please describe. I have a scenario where I would like to aggregate events over time, and then handle them as a batch. The size of this batch must be pretty large for my downstream service’s optimizations, so for example let’s say 10k events.

Using the Event Hub SDK directly (Azure.Messaging.EventHubs) I could write a processor that aggregates incoming data into buckets (bucket-per-partition), and then when a bucket hits a certain event count threshold (or when enough time has passed since it was updated), I would “flush” the bucket, and then update the checkpoint for that particular partition. This way I am never at risk of data loss. This same approach cannot be taken when using the current webjob SDK, as it automatically updates the checkpoint after every X batches are “processed”, so if my process crashes before it can flush a bucket, all the data in that bucket is lost.

The current SDK has the property MaxBatchSize which puts an upper limit on the amount of messages in a batch, but it isn’t related to the actual amount of unprocessed messages in a given partition, so even if I have thousands of events waiting to be processed, and my MaxBatchSize is set to 10k, I can still receive batches of 5-7 messages per batch.

I am proposing a MinBatchSize property that will tell the web job SDK to aggregate data in in-memory buckets. If set, the internal processor will fill the relevant bucket instead of triggering, and will not update the checkpoint. When a bucket hits the MinBatchSize, or when the configured time span since the last event has expired, the processor will trigger the user-code and afterwards will update the checkpoint for the given partition.

As far as I understand, the “invoke after enough time has passed since the last event” behavior was removed in a recent PR (#19140), but it might be needed here

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:15 (9 by maintainers)

Top GitHub Comments

2reactions

ZachTB123commented, Mar 24, 2021

This is desperately needed in my opinion. We are deploying Function Apps and Event Hubs into all our subscriptions and in multiple regions for a logging solution. This works out to be thousands of Function Apps and Event Hubs. As we are migrating away from our current solution to Function Apps, I am seeing a concerning price increase for the Storage Accounts associated with the Function Apps (where checkpoints are stored) for subscriptions that are producing a large volume of logs to the Event Hubs.

The volume of logs that we receive into the Event Hubs can be anywhere from tens of messages per minute to millions of messages per minute. One subscription we have is producing logs into the Event Hub at a steady rate of around 100k messages/minute. In our host.json file, we are specifying maxBatchSize as 1000 but based on the logs in App Insights a vast majority (realistically almost all) of executions are only receiving a batch size of one. I am aware that maxBatchSize is not a guarantee of how many messages you will receive in a batch. This is causing us to have 30k+ function executions/minute. That alone is pricey (even though the executions are < 1 second). This has also caused an excessive amount of transactions against the Storage Account - I believe it was something like 90k transactions/minute. I modified the batchCheckpointFrequency field from 1 to 5 and that helped cut down the transactions to < 30k/minute but I still find that excessive. We are required to have advanced threat protection turned on for all Storage Accounts and I believe I worked that out to be $40+ a day which is ridiculous. Looking at the price breakdown for the Storage Account, advanced threat protection and write operations were the most costly. This one Storage Account has now become the majority of the cost for our resource group.

I would like to see a configuration available in host.json that would allow me to configure a minimum batch size (I am guaranteed to receive at least that many messages per execution) and a maximum batching size window (if minimum batch size is not met within some configurable amount of time, give me what you have so far). This is what AWS Lambda has with Kinesis and it is really useful. Link to some Terraform configuration that controls this here and here. This would hopefully cut down on the number of executions that my function has and ultimately reduce the number of transactions against the Storage Account. This would also help our code be more performant since the downstream service we are sending logs to can be a bottleneck. To get around this we batch up logs into one request before pushing but we aren’t getting that benefit if each function invocation is only dealing with one message at a time. As long as configurations are properly documented, I am happy to work through adjusting the values to meet my function’s needs.

Also, I would appreciate any advice to help cut down on our Storage Account transactions until this would be implemented.

1reaction

jsquirecommented, Oct 6, 2021

@JoshLove-msft: No, it hasn’t - its not something that we’ve seen feedback requesting.

For the majority case, EventProcessorClient is the processor type in use, which is single-dispatch for delivering events to handlers. For the EventProcessor<T> that underpins the Functions extensions, the focus to has been to maximize throughput by dispatching as quickly as possible once any events are available in the prefetch queue.

I think the Functions scenario is somewhat unique in that there’s cost associated with invocations. I’d be inclined to say that we should consider building this into the EventProcessorHost in the Functions bindings to start and reassess moving to the base class if we see a more general demand for it.

Top Results From Across the Web

Azure WebJobs Event Hubs client library for .NET

This extension provides functionality for accessing Azure Event Hubs from an Azure Function. Getting started. Install the package. Install the ...

Using an Azure WebJob to read from an EventHub

The job builds and runs without throwing any exceptions, but the trigger is never called. I am referencing Microsoft.Azure.WebJobs, Microsoft.

Class EventHubOptions | Azure SDK for .NET

The address to use for establishing a connection to the Event Hubs service, allowing network requests to be routed through any application gateways...

Building an event streaming app with Azure Functions ...

Back in 2020, I wrote an article on how you can build a simple streaming app using Azure Functions, Event Hubs and Azure...

Azure Event Hubs and its Complete Overview

Azure Event Hubs monitoring and management challenges are solved by using Serverless360. Read to know more about the complete overview of Azure Event...