BlockBlobWriteStream flush method creates a new blob version
See original GitHub issueDescribe the bug We use the OpenWrite of BlockBlobClient to create writeable streams to azure block blob storage. Whenever we call flush to clear the internal buffer of the write stream, it seems that the the stream tells the storage container we are done writing to the file and the file is committed, creating a new blob version for each “flush” call, while the write stream is still open.
Some background: We use 3rd party libraries that can handle streams. We pass a BlockBlobWriteStream to a method of a 3rd party library to save some file to the stream. This library calls flush multiple times, causing loads of blob versions in blob storage. As a workaround we have now implemented a wrapper stream that ignores flush calls.
Expected behavior We would expect that the “flush” method would only clear internal buffers to azure storage, but does not cause a new version of the blob to be created.
Actual behavior (include Exception or Stack Trace) Currently, each call to “flush” creates a new blob version. Even more interesting: we discovered that every once in a while (very rarely) a file doesn’t get written to blob storage correctly and the file becomes corrupt. This seems to be related to the flush calls, because since we implemented the wrapper stream to avoid flush calls, the file corruption seems to have disappeared. We haven’t been able to reproduce the corruption in a test scenario though.
To Reproduce Make you you have a storage account with blob versioning enabled.
The following snippet creates 100 versions of the same blob, while I would expected only 1 version to be created:
var bytes = <a bunch of bytes>
using (var inputstream = blockBlob.OpenWrite(true))
{
for (var i = 0; i < 100; i++)
{
inputstream.Write(bytes, 0, 100);
inputstream.Flush();
}
}
Environment:
- Azure.Storage.Blobs 12.8.1
- Windows 10 .NET Framework 4.8
- Visual Studio 16.9.4
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:13 (7 by maintainers)
Top GitHub Comments
I have encountered the same problem recenly which is also related to excessive flushes and I managed to make it fully reproducible:
Corruption always happens if you do this:
4+1 and 4+3 are fine 4+2 - always causes corruption
More details: https://stackoverflow.com/questions/67280408/specific-combinations-of-flush-and-write-calls-corrupts-blockblob-used-via-openw
update
The bug is reproducible until
12.9.0-beta.2
and12.8.3
versions. Some versions older than the latest stable also fail but in a different way (tail is gone, not duplicated) so I am not entirely sure if the bug is gone or it now has a different form.https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/storage/Azure.Storage.Blobs/CHANGELOG.md#1290-beta2-2021-03-09
https://github.com/Azure/azure-sdk-for-net/commit/934afe12138ed001bef466d61a6555fa87c66140#diff-d28bf63a473ece5a4fdafca01fdf392a56ce102979384e7ce631d9e3f7a1b52b
This would be the best of both worlds, and everyone will be happy.