question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support a MinBatchSize property in web job event hub extensions

See original GitHub issue

Library or service name. Microsoft.Azure.WebJobs.Extensions.EventHubs

Is your feature request related to a problem? Please describe. I have a scenario where I would like to aggregate events over time, and then handle them as a batch. The size of this batch must be pretty large for my downstream service’s optimizations, so for example let’s say 10k events.

Using the Event Hub SDK directly (Azure.Messaging.EventHubs) I could write a processor that aggregates incoming data into buckets (bucket-per-partition), and then when a bucket hits a certain event count threshold (or when enough time has passed since it was updated), I would “flush” the bucket, and then update the checkpoint for that particular partition. This way I am never at risk of data loss. This same approach cannot be taken when using the current webjob SDK, as it automatically updates the checkpoint after every X batches are “processed”, so if my process crashes before it can flush a bucket, all the data in that bucket is lost.

The current SDK has the property MaxBatchSize which puts an upper limit on the amount of messages in a batch, but it isn’t related to the actual amount of unprocessed messages in a given partition, so even if I have thousands of events waiting to be processed, and my MaxBatchSize is set to 10k, I can still receive batches of 5-7 messages per batch.

I am proposing a MinBatchSize property that will tell the web job SDK to aggregate data in in-memory buckets. If set, the internal processor will fill the relevant bucket instead of triggering, and will not update the checkpoint. When a bucket hits the MinBatchSize, or when the configured time span since the last event has expired, the processor will trigger the user-code and afterwards will update the checkpoint for the given partition.

As far as I understand, the “invoke after enough time has passed since the last event” behavior was removed in a recent PR (#19140), but it might be needed here

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
ZachTB123commented, Mar 24, 2021

This is desperately needed in my opinion. We are deploying Function Apps and Event Hubs into all our subscriptions and in multiple regions for a logging solution. This works out to be thousands of Function Apps and Event Hubs. As we are migrating away from our current solution to Function Apps, I am seeing a concerning price increase for the Storage Accounts associated with the Function Apps (where checkpoints are stored) for subscriptions that are producing a large volume of logs to the Event Hubs.

The volume of logs that we receive into the Event Hubs can be anywhere from tens of messages per minute to millions of messages per minute. One subscription we have is producing logs into the Event Hub at a steady rate of around 100k messages/minute. In our host.json file, we are specifying maxBatchSize as 1000 but based on the logs in App Insights a vast majority (realistically almost all) of executions are only receiving a batch size of one. I am aware that maxBatchSize is not a guarantee of how many messages you will receive in a batch. This is causing us to have 30k+ function executions/minute. That alone is pricey (even though the executions are < 1 second). This has also caused an excessive amount of transactions against the Storage Account - I believe it was something like 90k transactions/minute. I modified the batchCheckpointFrequency field from 1 to 5 and that helped cut down the transactions to < 30k/minute but I still find that excessive. We are required to have advanced threat protection turned on for all Storage Accounts and I believe I worked that out to be $40+ a day which is ridiculous. Looking at the price breakdown for the Storage Account, advanced threat protection and write operations were the most costly. This one Storage Account has now become the majority of the cost for our resource group.

I would like to see a configuration available in host.json that would allow me to configure a minimum batch size (I am guaranteed to receive at least that many messages per execution) and a maximum batching size window (if minimum batch size is not met within some configurable amount of time, give me what you have so far). This is what AWS Lambda has with Kinesis and it is really useful. Link to some Terraform configuration that controls this here and here. This would hopefully cut down on the number of executions that my function has and ultimately reduce the number of transactions against the Storage Account. This would also help our code be more performant since the downstream service we are sending logs to can be a bottleneck. To get around this we batch up logs into one request before pushing but we aren’t getting that benefit if each function invocation is only dealing with one message at a time. As long as configurations are properly documented, I am happy to work through adjusting the values to meet my function’s needs.

Also, I would appreciate any advice to help cut down on our Storage Account transactions until this would be implemented.

1reaction
jsquirecommented, Oct 6, 2021

@JoshLove-msft: No, it hasn’t - its not something that we’ve seen feedback requesting.

For the majority case, EventProcessorClient is the processor type in use, which is single-dispatch for delivering events to handlers. For the EventProcessor<T> that underpins the Functions extensions, the focus to has been to maximize throughput by dispatching as quickly as possible once any events are available in the prefetch queue.

I think the Functions scenario is somewhat unique in that there’s cost associated with invocations. I’d be inclined to say that we should consider building this into the EventProcessorHost in the Functions bindings to start and reassess moving to the base class if we see a more general demand for it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Azure WebJobs Event Hubs client library for .NET
This extension provides functionality for accessing Azure Event Hubs from an Azure Function. Getting started. Install the package. Install the ...
Read more >
Using an Azure WebJob to read from an EventHub
The job builds and runs without throwing any exceptions, but the trigger is never called. I am referencing Microsoft.Azure.WebJobs, Microsoft.
Read more >
Class EventHubOptions | Azure SDK for .NET
The address to use for establishing a connection to the Event Hubs service, allowing network requests to be routed through any application gateways...
Read more >
Building an event streaming app with Azure Functions ...
Back in 2020, I wrote an article on how you can build a simple streaming app using Azure Functions, Event Hubs and Azure...
Read more >
Azure Event Hubs and its Complete Overview
Azure Event Hubs monitoring and management challenges are solved by using Serverless360. Read to know more about the complete overview of Azure Event...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found