VERY slow large blob downloads
See original GitHub issueI am confused about how to optimize BlobClient for downloading large blobs (up to 100 GB).
For example, on a ~480 MB blob the following code takes around 4 minutes to execute:
full_path_to_file = '{}/{}'.format(staging_path,blob_name)
blob = BlobClient.from_connection_string(conn_str=connection_string, container_name=container_name, blob_name=blob_name)
with open(full_path_to_file, "wb") as my_blob:
download_stream = blob.download_blob()
result = my_blob.write(download_stream.readall())
In the previous version of the SDK I was able to specify a max_connections parameter that sped download significantly. This appears to have been removed (along with progress callbacks, which is annoying). I have files upwards of 99 GB which will take almost 13 hours to download at this rate, whereas I used to be able to download similar files in under two hours.
How can I optimize the download of large blobs?
Thank you!
Edit: I meant that it took 4 minutes to download a 480 megabyte file. Also, I am getting memory errors when trying to download larger files (~40 GB).
Issue Analytics
- State:
- Created 3 years ago
- Comments:23 (7 by maintainers)
Top Results From Across the Web
Getting very slow speed while downloading storage blob.
Hello, I am getting very slow speed while downloading the storage blob from Azure storage account. I am using .net core 3.1 web...
Read more >Azure BLOB Download is very slow\inconsistent from a Linux ...
I was getting reports that "downloads from Azure are slow" from one of our datacenters, so to recreate I've hosted my own BLOB...
Read more >C# Azure Blob Client Download Large blob is slow
NET 4.8 using the Azure Blob client, latest version. For some regions, we are seeing download speeds in C# of as slow as...
Read more >Slow and stalled blob downloads : r/AZURE - Reddit
I'm trying to download several large blobs from a container using the new ... First, the downloads are slow - at least 3-4...
Read more >Do's and Don'ts for Streaming File Uploads to Azure Blob ...
NET MVC for uploading large files to Azure Blob Storage ... This is slow and it is wasteful if all we want to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I experienced timeouts on larger downloads as well >100GB commonly and >200GB would always fail, when using .readall(), more on that below. Of note, max_concurrency did NOT resolve this for me. For me it seems that the Auth header timestamp got older than the accepted 25 minute age limit. So the client isn’t updating the header automatically. I was able to work around it, in a ugly manner.
Rinse and repeat till the download completes. Note I build a checksum as I download since I know the checksum of the original file so I have high confidence of file integrity and validate at the end. Performance wise on a 1Gbps link for a single blob out of cool storage I get ~430Mbps / 53.75MB/s. Azure side cool tier is 60MB/s limit or there about so it seems to work pretty well.
Building on @mockodin fine remarks I implemented a file like object on top of blob object, and I was very successful (it does not the reauth trick he mentionned because I did not need that), the downloading speed was enhanced maybe ten times when using this iterator vs the one included in the SDK, many thanks to you, @mockodin !