question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] BlobClient.UploadAsync with an unseekable stream always buffers the entire stream

See original GitHub issue

Describe the bug When trying to write a lightweight service that receives content in a Stream and passes it on to a blob upload, I encountered pathological buffering of the entire Stream content.

I believe this to be caused by way the blob client implementation uses HttpClient.

  • BlobClient.UploadAsync(Stream) calls the extension method BlobOperationsExtensions.UploadAsync
  • BlobOperationsExtensions.UploadAsync calls BlobOperations.UploadWithHttpMessagesAsync
  • BlobOperations.UploadWithHttpMessagesAsync sets up an HttpRequestMessage, setting its Content property to an instance of StreamContent with the supplied Stream
  • This HttpRequestMessage is then delivered using HttpClient.SendAsync
  • HttpClient.SendAsync handles the message using the HttpClientHandler class
  • The PrepareAndStartContentUpload method in the HttpClientHandler class checks if the HttpContent object it has been given specifies a Content-Length header. If no Content-Length header is provided, then it buffers the content in order to determine its length.

But, the HttpContent in question is the StreamContent initialized earlier by UploadWithHttpMessagesAsync in BlobOperations, and it does not set a Content-Length header.

Expected behavior Data is transferred from the stream in manageably-sized chunks, so that memory usage is well-known regardless of the stream size. Specifically, if the Stream is seekable, then the stream’s Length should be used for the Content-Length header value automatically. (This strategy is used in other ways inside StreamContent in its TryComputeLength method.)

Actual behavior (include Exception or Stack Trace) If the stream contains 500MB of data then the process memory usage will grow by 500MB before the blob upload operation actually initiates the network connection to the Azure servers. If the stream contains more than 2GB of data then the blob upload operation will fail because MemoryStream will refuse to store more than 2GB of data.

To Reproduce Using a BlobClient, use an UploadAsync operation that accepts a Stream for the data. Provide a FileStream pointing at a large file for the effects on process memory to be more easily noticeable.

Environment:

  • Azure.Storage.Blobs 12.10.0
  • Windows 10 .NET Core 3.1
  • Visual Studio 16.11.3

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
amishra-devcommented, Oct 31, 2021

@jaschrep-msft can we please create a documentation task to get this documented (improvement to the SDK’s uploadAsync documentation) in the backlog?

0reactions
logiclrdcommented, Nov 2, 2021

Indeed, it appears that the key thing was having StorageTransferOptions fully-populated. With default options, it would always try to ascertain the length before starting the upload, buffering the entire stream in the process. But, now that InitialTransferSize and MaximumTransferSize are set, the original unseekable stream can be provided without memory usage growing without bound.

I just did a transfer of a 2.7 GB file, and memory usage climbed continuously for a good 10% of the transfer, but more and more slowly as time went on, and eventually reversing direction and gradually dropping, slowly fluctuating.

  • It started at 75 MB and climbed quickly at first.
  • When the transfer reached 275 MB, memory usage was about 143 MB.
  • By 425 MB, it was at 154 MB, and it appeared to have essentially plateaued. Memory usage continued to fluctuate, for instance hitting about 162 MB around the 800 MB point, but it subsequently dropped back down as low as 140 MB.
  • Just before the 2 GB mark, however, it had dropped back as far as 120 MB. By 2 GB it was back up to 132 MB.

I repeated the test with MaximumConcurrency set to 5 with very similar results. In both tests, I interrupted the network to verify it could restart and found no issues.

To be clear, forwarding this file transfer from an incoming web request to an outbound BlobClient.UploadAsync call is the only thing the process was doing.

I’m not sure why memory usage increased so much with a single upload thread and a 512 KB buffer size. Perhaps it is something within the ASP.NET Core stack or Kestrel. In any case, it didn’t end up trying to store the entire 2.7 GB file in a MemoryStream as it had originally.

Documentation that calls out the significance of the stream being seekable and the importance of InitialTransferSize and MaximumTransferSize for non-seekable streams would indeed be very helpful. 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] BlobClient.UploadAsync never completes #12811
BlobClient.UploadAsync never completes if it is called with a Stream where Position isn't 0. Expected behavior. If should write the stream from ...
Read more >
Why does BlobClient.UploadAsync hang when uploading ...
I'm trying to upload JSON to an Azure blob via a memory stream. When I call UploadAsync my application hangs. If I move...
Read more >
Tuning your uploads and downloads with the Azure ...
To ensure resiliency, if a stream isn't seekable, the Storage client libraries will buffer the data for each individual REST call before ...
Read more >
Dos and Don'ts for Streaming File Uploads to Azure Blob ...
Implement file uploads the wrong way, and you may end up with memory leaks, server slowdowns, out-of-memory errors, and worst of all, unhappy...
Read more >
Error while copying content to a stream. (blobClient. ...
In this line await blobClient.UploadAsync(stream); I keep getting the same exception. System.AggregateException: 'Retry failed after 6 tries.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found