[BUG] BlobClient.UploadAsync with an unseekable stream always buffers the entire stream
See original GitHub issueDescribe the bug
When trying to write a lightweight service that receives content in a Stream
and passes it on to a blob upload, I encountered pathological buffering of the entire Stream
content.
I believe this to be caused by way the blob client implementation uses HttpClient
.
BlobClient.UploadAsync(Stream)
calls the extension methodBlobOperationsExtensions.UploadAsync
BlobOperationsExtensions.UploadAsync
callsBlobOperations.UploadWithHttpMessagesAsync
BlobOperations.UploadWithHttpMessagesAsync
sets up anHttpRequestMessage
, setting itsContent
property to an instance ofStreamContent
with the suppliedStream
- This
HttpRequestMessage
is then delivered usingHttpClient.SendAsync
HttpClient.SendAsync
handles the message using theHttpClientHandler
class- The
PrepareAndStartContentUpload
method in theHttpClientHandler
class checks if theHttpContent
object it has been given specifies aContent-Length
header. If noContent-Length
header is provided, then it buffers the content in order to determine its length.
But, the HttpContent
in question is the StreamContent
initialized earlier by UploadWithHttpMessagesAsync
in BlobOperations
, and it does not set a Content-Length
header.
Expected behavior
Data is transferred from the stream in manageably-sized chunks, so that memory usage is well-known regardless of the stream size. Specifically, if the Stream
is seekable, then the stream’s Length
should be used for the Content-Length
header value automatically. (This strategy is used in other ways inside StreamContent
in its TryComputeLength
method.)
Actual behavior (include Exception or Stack Trace)
If the stream contains 500MB of data then the process memory usage will grow by 500MB before the blob upload operation actually initiates the network connection to the Azure servers. If the stream contains more than 2GB of data then the blob upload operation will fail because MemoryStream
will refuse to store more than 2GB of data.
To Reproduce
Using a BlobClient
, use an UploadAsync
operation that accepts a Stream
for the data. Provide a FileStream
pointing at a large file for the effects on process memory to be more easily noticeable.
Environment:
- Azure.Storage.Blobs 12.10.0
- Windows 10 .NET Core 3.1
- Visual Studio 16.11.3
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (2 by maintainers)
Top GitHub Comments
@jaschrep-msft can we please create a documentation task to get this documented (improvement to the SDK’s uploadAsync documentation) in the backlog?
Indeed, it appears that the key thing was having
StorageTransferOptions
fully-populated. With default options, it would always try to ascertain the length before starting the upload, buffering the entire stream in the process. But, now thatInitialTransferSize
andMaximumTransferSize
are set, the original unseekable stream can be provided without memory usage growing without bound.I just did a transfer of a 2.7 GB file, and memory usage climbed continuously for a good 10% of the transfer, but more and more slowly as time went on, and eventually reversing direction and gradually dropping, slowly fluctuating.
I repeated the test with
MaximumConcurrency
set to 5 with very similar results. In both tests, I interrupted the network to verify it could restart and found no issues.To be clear, forwarding this file transfer from an incoming web request to an outbound
BlobClient.UploadAsync
call is the only thing the process was doing.I’m not sure why memory usage increased so much with a single upload thread and a 512 KB buffer size. Perhaps it is something within the ASP.NET Core stack or Kestrel. In any case, it didn’t end up trying to store the entire 2.7 GB file in a
MemoryStream
as it had originally.Documentation that calls out the significance of the stream being seekable and the importance of
InitialTransferSize
andMaximumTransferSize
for non-seekable streams would indeed be very helpful. 😃