question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BlockBlobWriteStream flush method creates a new blob version

See original GitHub issue

Describe the bug We use the OpenWrite of BlockBlobClient to create writeable streams to azure block blob storage. Whenever we call flush to clear the internal buffer of the write stream, it seems that the the stream tells the storage container we are done writing to the file and the file is committed, creating a new blob version for each “flush” call, while the write stream is still open.

Some background: We use 3rd party libraries that can handle streams. We pass a BlockBlobWriteStream to a method of a 3rd party library to save some file to the stream. This library calls flush multiple times, causing loads of blob versions in blob storage. As a workaround we have now implemented a wrapper stream that ignores flush calls.

Expected behavior We would expect that the “flush” method would only clear internal buffers to azure storage, but does not cause a new version of the blob to be created.

Actual behavior (include Exception or Stack Trace) Currently, each call to “flush” creates a new blob version. Even more interesting: we discovered that every once in a while (very rarely) a file doesn’t get written to blob storage correctly and the file becomes corrupt. This seems to be related to the flush calls, because since we implemented the wrapper stream to avoid flush calls, the file corruption seems to have disappeared. We haven’t been able to reproduce the corruption in a test scenario though.

To Reproduce Make you you have a storage account with blob versioning enabled.

The following snippet creates 100 versions of the same blob, while I would expected only 1 version to be created:

var bytes = <a bunch of bytes>
using (var inputstream = blockBlob.OpenWrite(true))
{
  for (var i = 0; i < 100; i++)
  {
    inputstream.Write(bytes, 0, 100);
    inputstream.Flush();
  }
}

Environment:

  • Azure.Storage.Blobs 12.8.1
  • Windows 10 .NET Framework 4.8
  • Visual Studio 16.9.4

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:7
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

5reactions
yapaxicommented, Apr 28, 2021

I have encountered the same problem recenly which is also related to excessive flushes and I managed to make it fully reproducible:

Corruption always happens if you do this:

  • write 4 bytes
  • flush
  • write 2 bytes
  • dispose the stream
  • read the file back

4+1 and 4+3 are fine 4+2 - always causes corruption

More details: https://stackoverflow.com/questions/67280408/specific-combinations-of-flush-and-write-calls-corrupts-blockblob-used-via-openw

update

The bug is reproducible until 12.9.0-beta.2 and 12.8.3 versions. Some versions older than the latest stable also fail but in a different way (tail is gone, not duplicated) so I am not entirely sure if the bug is gone or it now has a different form.

https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/storage/Azure.Storage.Blobs/CHANGELOG.md#1290-beta2-2021-03-09

https://github.com/Azure/azure-sdk-for-net/commit/934afe12138ed001bef466d61a6555fa87c66140#diff-d28bf63a473ece5a4fdafca01fdf392a56ce102979384e7ce631d9e3f7a1b52b

0reactions
PaulVrugtcommented, Apr 29, 2021

This would be the best of both worlds, and everyone will be happy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Specific combinations of Flush and Write calls corrupts ...
Basically, specific combination of writes and flushes corrupts a file in a way that if you read it back after writting then the...
Read more >
Manage block blobs with PowerShell - Azure Storage
The method you use to restore a deleted blob depends upon whether versioning is enabled on your storage account.
Read more >
com.microsoft.azure.storage.blob.BlobOutputStream.flush ...
Generates a new block ID to be used for PutBlock. waitForTaskToComplete. Waits for at least one task to complete. writeInternal. Writes the data...
Read more >
Create and list blob versions in .NET - Azure Storage
In this article. Modify a blob to trigger a new version; List blob versions; Copy a previous blob version over the base blob;...
Read more >
Untitled
Write stream to azure blob BlockBlobWriteStream flush method creates a new blob version … How to store pandas dataframe data to azure blobs...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found