Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] BlobClient.UploadAsync with an unseekable stream always buffers the entire stream

See original GitHub issue

Describe the bug When trying to write a lightweight service that receives content in a Stream and passes it on to a blob upload, I encountered pathological buffering of the entire Stream content.

I believe this to be caused by way the blob client implementation uses HttpClient.

BlobClient.UploadAsync(Stream) calls the extension method BlobOperationsExtensions.UploadAsync
BlobOperationsExtensions.UploadAsync calls BlobOperations.UploadWithHttpMessagesAsync
BlobOperations.UploadWithHttpMessagesAsync sets up an HttpRequestMessage, setting its Content property to an instance of StreamContent with the supplied Stream
This HttpRequestMessage is then delivered using HttpClient.SendAsync
HttpClient.SendAsync handles the message using the HttpClientHandler class
The PrepareAndStartContentUpload method in the HttpClientHandler class checks if the HttpContent object it has been given specifies a Content-Length header. If no Content-Length header is provided, then it buffers the content in order to determine its length.

But, the HttpContent in question is the StreamContent initialized earlier by UploadWithHttpMessagesAsync in BlobOperations, and it does not set a Content-Length header.

Expected behavior Data is transferred from the stream in manageably-sized chunks, so that memory usage is well-known regardless of the stream size. Specifically, if the Stream is seekable, then the stream’s Length should be used for the Content-Length header value automatically. (This strategy is used in other ways inside StreamContent in its TryComputeLength method.)

Actual behavior (include Exception or Stack Trace) If the stream contains 500MB of data then the process memory usage will grow by 500MB before the blob upload operation actually initiates the network connection to the Azure servers. If the stream contains more than 2GB of data then the blob upload operation will fail because MemoryStream will refuse to store more than 2GB of data.

To Reproduce Using a BlobClient, use an UploadAsync operation that accepts a Stream for the data. Provide a FileStream pointing at a large file for the effects on process memory to be more easily noticeable.

Environment:

Azure.Storage.Blobs 12.10.0
Windows 10 .NET Core 3.1
Visual Studio 16.11.3

Issue Analytics

State:
Created 2 years ago
Comments:11 (2 by maintainers)

Top GitHub Comments

1reaction

amishra-devcommented, Oct 31, 2021

@jaschrep-msft can we please create a documentation task to get this documented (improvement to the SDK’s uploadAsync documentation) in the backlog?

0reactions

logiclrdcommented, Nov 2, 2021

Indeed, it appears that the key thing was having StorageTransferOptions fully-populated. With default options, it would always try to ascertain the length before starting the upload, buffering the entire stream in the process. But, now that InitialTransferSize and MaximumTransferSize are set, the original unseekable stream can be provided without memory usage growing without bound.

I just did a transfer of a 2.7 GB file, and memory usage climbed continuously for a good 10% of the transfer, but more and more slowly as time went on, and eventually reversing direction and gradually dropping, slowly fluctuating.

It started at 75 MB and climbed quickly at first.
When the transfer reached 275 MB, memory usage was about 143 MB.
By 425 MB, it was at 154 MB, and it appeared to have essentially plateaued. Memory usage continued to fluctuate, for instance hitting about 162 MB around the 800 MB point, but it subsequently dropped back down as low as 140 MB.
Just before the 2 GB mark, however, it had dropped back as far as 120 MB. By 2 GB it was back up to 132 MB.

I repeated the test with MaximumConcurrency set to 5 with very similar results. In both tests, I interrupted the network to verify it could restart and found no issues.

To be clear, forwarding this file transfer from an incoming web request to an outbound BlobClient.UploadAsync call is the only thing the process was doing.

I’m not sure why memory usage increased so much with a single upload thread and a 512 KB buffer size. Perhaps it is something within the ASP.NET Core stack or Kestrel. In any case, it didn’t end up trying to store the entire 2.7 GB file in a MemoryStream as it had originally.

Documentation that calls out the significance of the stream being seekable and the importance of InitialTransferSize and MaximumTransferSize for non-seekable streams would indeed be very helpful. 😃