question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

High memory usage in multiple async file download

See original GitHub issue
  • Package Name: azure-storage-blob
  • Package Version: 12.14.1
  • Operating System: linux, windows
  • Python Version: 3.7.15, 3.11

Describe the bug We are using the azure.storage.blob.aio to download a lot (~50k) of small (100 kB) blobs. The memory usage (> 1 GB) of our program increases indefinitely over time, and it seems to be related to the BlobClient.

To Reproduce Steps to reproduce the behavior: The script below starts up to 75 concurrent blob downloads. It consistently uses more and more memory as the program iterates through all blobs (>100k) in a container.

from pathlib import Path
import asyncio

from azure.storage.blob.aio import ContainerClient

base_folder = Path("C:/Temp/Azure")
container_url = "MY_CONTAINER_URL"

async def download_blob(container_client, blob_name):
    dest_file = base_folder / blob_name
    dest_file.parent.mkdir(parents=True, exist_ok=True)
    with open(dest_file, "wb") as fp:
        stream = await container_client.download_blob(blob_name)
        data = await stream.readall()
        fp.write(data)

async def main():
    background_tasks = set()
    sem = asyncio.Semaphore(75)
    async with ContainerClient.from_container_url(container_url) as cc:
        async for blob_name in cc.list_blob_names():
            await sem.acquire()
            task = asyncio.create_task(download_blob(cc, blob_name))
            background_tasks.add(task)
            task.add_done_callback(background_tasks.discard)
            task.add_done_callback(lambda x: sem.release())

    await asyncio.gather(*background_tasks)
asyncio.run(main())

Edit: Change to calling download_blob() instead of creating a blob_client

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jalauzon-msftcommented, Nov 8, 2022

Hi @tboerstad Thomas, thanks for the info. My current thinking is that you may be correct in saying this is related to #27023 which we are actively investigating. For now, we’ll treat this as the same. Please see other issue for when we provide updates.

0reactions
tboerstadcommented, Nov 23, 2022

@jalauzon-msft I have tested again, and I am also unable to see any memory issue. If I have something in the code I’m working on, it must be unrelated to the Blob SDK.

Thank you for you clarifications, you have helped solve my issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

High memory usage in multiple async files upload #27023
I'm running the uploads in batches of 20 async tasks. After 30 minutes in which I uploaded ~30,000 files, the container reached 1.5Gb...
Read more >
c# - High memory usage with WebClient DownloadData
I have an issue memory not being released after webclient has downloaded data so I tested with the below sample code and it...
Read more >
Memory management when using async/await in Swift
Managing an app's memory is something that tends to be especially tricky to do within the context of asynchronous code, as various objects ......
Read more >
Memory management and patterns in ASP.NET Core
Learn how memory is managed in ASP.NET Core and how the garbage collector (GC) works.
Read more >
Download Large Files with HTTPoison Async Requests
Make requests with HTTPoison is easy, but the response is held in memory. To download large files we need to divide the response...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found