question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

High memory usage in multiple async files upload

See original GitHub issue
  • Package Name: azure-storage-blob
  • Package Version: 12.12.0, 12.14.0
  • Operating System: linux
  • Python Version: 3.6.9, 3.8, 3.10

Describe the bug We are using the azure.storage.blob.aio package to upload multiple files to our storage container. In order to make the upload efficient, we are creating a batch of async upload tasks and executing them all using await asyncio.gather(*tasks). After some time, we encountered a very high memory consumption of the container running this app, which constantly increases. I tried to investigate what is using all the memory and it seems to me that every execution of SDK’s blob_client.upload_blob adds few MBs to the memory, without releasing it.

To Reproduce Steps to reproduce the behavior: I was able to reproduce the issue with the following snippet

async def upload(storage_path):
    async with ContainerClient.from_connection_string(conn_str=get_connection_string(), container_name=CONTAINER_NAME) as container_client:
        blob_client = container_client.get_blob_client(blob=storage_path)
        with open(TEST_FILE, 'rb') as file_to_upload:
            await blob_client.upload_blob(file_to_upload, length=os.path.getsize(TEST_FILE), overwrite=True)
        await blob_client.close()


@profile
async def run_multi_upload(n):
    tasks = []
    for i in range(n):
        tasks.append(upload(f"storage_client_memory/test_file_{i}"))
    await asyncio.gather(*tasks)

if __name__ == '__main__':
    asyncio.run(run_multi_upload(100))

Expected behavior I was expecting to have a normal memory consumption since I’m not actively loading anything unusual to memory.

Screenshots I used the memory_profiler package to check the reason for the high memory consumption, this is its output for running the above snippet:

for a single async file upload we can see that the blob_client.upload_blob adds few MB to memory -

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    18    114.3 MiB    104.3 MiB          76   @profile
    19                                         async def upload(storage_path):
    20    114.3 MiB      3.6 MiB          76       async with ContainerClient.from_connection_string(conn_str=get_connection_string(), container_name=CONTAINER_NAME) as container_client:
    21    114.3 MiB      3.8 MiB          76           blob_client = container_client.get_blob_client(blob=storage_path)
    22    114.3 MiB      0.0 MiB          76           with open(TEST_FILE, 'rb') as file_to_upload:
    23    125.6 MiB     13.8 MiB         474               await blob_client.upload_blob(file_to_upload, length=os.path.getsize(TEST_FILE), overwrite=True)
    24    125.6 MiB      0.0 MiB         172           await blob_client.close()

And in total, the await asyncio.gather(*tasks) adds 24.9 MiB -

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    25     99.8 MiB     99.8 MiB           1   @profile
    26                                         async def run_multi_upload(n):
    27     99.8 MiB      0.0 MiB           1       tasks = []
    28     99.8 MiB      0.0 MiB         101       for i in range(n):
    29     99.8 MiB      0.0 MiB         100           tasks.append(upload(f"storage_client_memory/test_file_{i}"))
    30    124.7 MiB     24.9 MiB           2       await asyncio.gather(*tasks)

Additional context My app is running in a Kubernetes cluster as a side-car container and constantly uploads files from the cluster to our storage. I’m running the uploads in batches of 20 async tasks. After 30 minutes in which I uploaded ~30,000 files, the container reached 1.5Gb of memory consumption. This was as part of a stress test for the container.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jalauzon-msftcommented, Nov 14, 2022

Hi @morpel, creating a single ContainerClient instance on startup and using it for all your requests should be find and would be the recommended approach. This should work across threads as well.

1reaction
swathipilcommented, Oct 25, 2022

Hi @morpel - Thanks for the detailed report! We’ll investigate asap!

Read more comments on GitHub >

github_iconTop Results From Across the Web

High memory usage when uploading and manipulating many ...
With Firefox, when I drag in ~1GB of files, Firefox's memory usage steadily rises during the upload and stays high even after the...
Read more >
Upload large amounts of random data in parallel to Azure ...
Learn how to use the Azure Storage client library to upload large amounts of random data in parallel to an Azure Storage account....
Read more >
A Memory-Friendly Way of Reading Files in Node.js
It takes only one line of code to read a file and then a single for loop to iterate over the content: The...
Read more >
Asynchronous File Uploads - Janko's Blog
A vanilla file upload implementation where all of this is synchronous has two main downsides: (a) the UI is blocked during these actions,...
Read more >
File uploads — gql 3 3.5.0b0 documentation
If the files are not too big and you have enough RAM, it is not a problem. On another hand if you want...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found