High memory usage in multiple async file download
See original GitHub issue- Package Name: azure-storage-blob
- Package Version: 12.14.1
- Operating System: linux, windows
- Python Version: 3.7.15, 3.11
Describe the bug
We are using the azure.storage.blob.aio
to download a lot (~50k) of small (100 kB) blobs.
The memory usage (> 1 GB) of our program increases indefinitely over time, and it seems to be related to the BlobClient.
To Reproduce Steps to reproduce the behavior: The script below starts up to 75 concurrent blob downloads. It consistently uses more and more memory as the program iterates through all blobs (>100k) in a container.
from pathlib import Path
import asyncio
from azure.storage.blob.aio import ContainerClient
base_folder = Path("C:/Temp/Azure")
container_url = "MY_CONTAINER_URL"
async def download_blob(container_client, blob_name):
dest_file = base_folder / blob_name
dest_file.parent.mkdir(parents=True, exist_ok=True)
with open(dest_file, "wb") as fp:
stream = await container_client.download_blob(blob_name)
data = await stream.readall()
fp.write(data)
async def main():
background_tasks = set()
sem = asyncio.Semaphore(75)
async with ContainerClient.from_container_url(container_url) as cc:
async for blob_name in cc.list_blob_names():
await sem.acquire()
task = asyncio.create_task(download_blob(cc, blob_name))
background_tasks.add(task)
task.add_done_callback(background_tasks.discard)
task.add_done_callback(lambda x: sem.release())
await asyncio.gather(*background_tasks)
asyncio.run(main())
Edit: Change to calling download_blob()
instead of creating a blob_client
Issue Analytics
- State:
- Created 10 months ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
High memory usage in multiple async files upload #27023
I'm running the uploads in batches of 20 async tasks. After 30 minutes in which I uploaded ~30,000 files, the container reached 1.5Gb...
Read more >c# - High memory usage with WebClient DownloadData
I have an issue memory not being released after webclient has downloaded data so I tested with the below sample code and it...
Read more >Memory management when using async/await in Swift
Managing an app's memory is something that tends to be especially tricky to do within the context of asynchronous code, as various objects ......
Read more >Memory management and patterns in ASP.NET Core
Learn how memory is managed in ASP.NET Core and how the garbage collector (GC) works.
Read more >Download Large Files with HTTPoison Async Requests
Make requests with HTTPoison is easy, but the response is held in memory. To download large files we need to divide the response...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @tboerstad Thomas, thanks for the info. My current thinking is that you may be correct in saying this is related to #27023 which we are actively investigating. For now, we’ll treat this as the same. Please see other issue for when we provide updates.
@jalauzon-msft I have tested again, and I am also unable to see any memory issue. If I have something in the code I’m working on, it must be unrelated to the Blob SDK.
Thank you for you clarifications, you have helped solve my issue.