[Design Discussion] Should Event/Message Batches be Scoped to a Single Send or Reusable?
See original GitHub issueSummary
The current design of the EventDataBatch
in the Event Hubs client and ServiceBusMessageBatch
considers them scoped to a single send operation; once a batch is full and has been published, it is intended to be disposed and a new batch created for any subsequent send operations. As a result, batches are read-only entities into which events/messages may be added and then are held by the batch until it is disposed.
Scope of Discussion
-
Should the batch support a
Clear
operation that could be called to allow it to be cleared after a send operation and reused? -
Does allowing the batch to be emptied and reused cause confusion with respect to consideration for when it should be disposed?
-
Would including the ability to clear a batch be perceived as awkward without the ability to peek at events in a batch and/or remove a specific item?
Out of Scope
- Access to individual events in a batch; some implementations do not hold a reference to the event/message when added to the batch. Instead, the event/message is translated to the resulting wire format (such as an AMQP message) and membership in the batch is based on the wire format.
Considerations
-
The maximum allowable size for a batch is determined by the service and communicated to the client. Different service SKUs allow different sizes for a single send operation; the reason that batch creation is asynchronous is to allow the service to be queried. The service value is cached and used for any subsequent batch creations. Only the first batch creation or send operation pays the tax of making the query.
-
To determine the size of a batch, the events/messages in the batch must be serialized to the wire format of their protocol (for example, an AMQP message); adding a message to the batch requires paying the serialization cost to measure the resulting size in bytes. It is not possible to defer that action and accurately predict the batch size.
-
Allowing visibility or manipulation of individual messages/events in a batch would potentially double the memory needed for some batch implementations due to the implementation in some languages our would potentially come with a performance cost. For this reason, manipulating and exposing individual events in the batch is not currently open to consideration.
-
Historically, one of the reasons that a
Clear
operation was not considered is because the batch was attempting some “clever” optimizations with respect to managing the events/messages to their respective wire format. Early previews of Event Hubs surfaced corner cases that resulted in a more straight-forward implementation. -
There are some potential enhancements that are under consideration for the internal batch implementation to improve performance and lower resource costs. It will be important to consider the impact of any batch API changes against the potential improvements.
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (15 by maintainers)
Top GitHub Comments
Hmm but publishing the batch is still asynchronous. And I guess creating the batch is only asynchronous on the first call - subsequent calls should complete synchronously.
Potentially that has been already discussed internally - differences in batch usage between EventHubs and Service Bus customers. EventHubs is more telemetry focused, where re-using the batch would be very logical. Service Bus is more associated with business applications where events are discrete and a batch sending is taking place under certain conditions rather than continuously. Therefore a batch is used once and disposed of rather than re-used.