EventDataBatch.TryAdd does not copy buffer to batch as documentation indicates
See original GitHub issueLibrary name and version
Azure.Messaging.EventHubs v5.7.2
Describe the bug
The documentation for Azure.Messaging.EventHubs.Producer.EventDataBatch.TryAdd
has the following remarks:
Remarks When an event is accepted into the batch, its content and state are frozen; any changes made to the event will not be reflected in the batch nor will any state transitions be reflected to the original instance.
However, when trying to use a shared buffer as the backing source per EventData instance, the data is not persisted in the EventDataBatch
Expected behavior
using var eventDataBatch = await _eventHubClient.CreateBatchAsync();
var buffer = new byte[1024];
while (true)
{
buffer.Clear();
buffer = someInput;
if (!eventDataBatch.TryAdd(new EventData(buffer))
break;
}
I would expect, based on the remark, that during TryAdd, the buffer’s contents are serialized into the raw amqp message and added to the batch, allowing it to be reused in the add loop.
Actual behavior
What I am seeing however is that clearing the buffer is being reflected back into the batch itself, making the use of a buffer completely impossible and causing the above remark to be inaccurate at best and (in my case), quite dangerous.
Reproduction Steps
See the above.
Environment
I am seeing the same behavior on Windows, Linux and MacOS, on .NET 6.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Thank you all for the perspectives and discussion in the prototype PR (#31845). The outcome of those conversations ended in agreement that the performance cost of making a defensive copy outweighs the desire to guarantee the batch is fully immutable. Rather than doing so, we’ve updated the documentation to call out that any memory buffers held by the event are referenced by the batch and must remain available and unchanged until the batch is disposed. (see: #3204)
Closing this out.
Let me see if I can find a public write-up for ROM<T> where the .NET team talks about the contract.
In the meantime, I’ll mention that the number of allocations isn’t likely to change for your scenario if we do add a specific guard. We’d need to make a defensive copy of the buffer you pass in, which would then serve as where the body of the serialized message points as a buffer. Whether we allocate or you do, it’s the same net result.
The main difference would likely be that we’d increase allocations for common case where callers don’t mutate the body after creating the event, since we’d have to assume that any
byte[]
is potentially unsafe.