Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Design Discussion] Should Event/Message Batches be Scoped to a Single Send or Reusable?

See original GitHub issue

Summary

The current design of the EventDataBatch in the Event Hubs client and ServiceBusMessageBatch considers them scoped to a single send operation; once a batch is full and has been published, it is intended to be disposed and a new batch created for any subsequent send operations. As a result, batches are read-only entities into which events/messages may be added and then are held by the batch until it is disposed.

Scope of Discussion

Should the batch support a Clear operation that could be called to allow it to be cleared after a send operation and reused?
Does allowing the batch to be emptied and reused cause confusion with respect to consideration for when it should be disposed?
Would including the ability to clear a batch be perceived as awkward without the ability to peek at events in a batch and/or remove a specific item?

Out of Scope

Access to individual events in a batch; some implementations do not hold a reference to the event/message when added to the batch. Instead, the event/message is translated to the resulting wire format (such as an AMQP message) and membership in the batch is based on the wire format.

Considerations

The maximum allowable size for a batch is determined by the service and communicated to the client. Different service SKUs allow different sizes for a single send operation; the reason that batch creation is asynchronous is to allow the service to be queried. The service value is cached and used for any subsequent batch creations. Only the first batch creation or send operation pays the tax of making the query.
To determine the size of a batch, the events/messages in the batch must be serialized to the wire format of their protocol (for example, an AMQP message); adding a message to the batch requires paying the serialization cost to measure the resulting size in bytes. It is not possible to defer that action and accurately predict the batch size.
Allowing visibility or manipulation of individual messages/events in a batch would potentially double the memory needed for some batch implementations due to the implementation in some languages our would potentially come with a performance cost. For this reason, manipulating and exposing individual events in the batch is not currently open to consideration.
Historically, one of the reasons that a Clear operation was not considered is because the batch was attempting some “clever” optimizations with respect to managing the events/messages to their respective wire format. Early previews of Event Hubs surfaced corner cases that resulted in a more straight-forward implementation.
There are some potential enhancements that are under consideration for the internal batch implementation to improve performance and lower resource costs. It will be important to consider the impact of any batch API changes against the potential improvements.

Issue Analytics

State:
Created 3 years ago
Comments:16 (15 by maintainers)

Top GitHub Comments

2reactions

JoshLove-msftcommented, Apr 15, 2020

We’ve received feedback that, in some cases, creating a batch is awkward due to the asynchonous nature of the call, and developers would prefer to have a flow something like:
- Create a batch

- While there are events/messages to publish:
    - Add events/messages to the batch
    - Publish the batch
    - Clear the batch

Hmm but publishing the batch is still asynchronous. And I guess creating the batch is only asynchronous on the first call - subsequent calls should complete synchronously.

1reaction

SeanFeldmancommented, Apr 22, 2020

Potentially that has been already discussed internally - differences in batch usage between EventHubs and Service Bus customers. EventHubs is more telemetry focused, where re-using the batch would be very logical. Service Bus is more associated with business applications where events are discrete and a batch sending is taking place under certain conditions rather than continuously. Therefore a batch is used once and disposed of rather than re-used.

Top Results From Across the Web

MCD-Level-1.pdf - Questions & Answers PDF P-1 MuleSoft...

A. For Each is single-threaded and Batch Job is multi-threadedB. Both are single-threadedC. Both are multi-threadedD. Batch Job is single-threaded and For ...

Infor LN BODs and BDEs Development Guide

Change methods that have a specific batch implementation through an on execute hook. Custom data will not be handled if the corresponding ...

UC San Diego

Additionally, in this thesis we will discuss various design patterns for Cell application development. These patterns are also referred to as templates and....

Automated Testing and Real-time Event Management

This paper discusses a specific design to monitor the conditions. SAS programs encounter. It explores the concepts of automated.

Apex Developer Guide - Salesforce Implementation guides

In Apex, statements must end with a semicolon and can be one of the following types: ... Batch Apex and Apex Scheduler.