High memory usage in multiple async files upload
See original GitHub issue- Package Name: azure-storage-blob
- Package Version: 12.12.0, 12.14.0
- Operating System: linux
- Python Version: 3.6.9, 3.8, 3.10
Describe the bug
We are using the azure.storage.blob.aio
package to upload multiple files to our storage container. In order to make the upload efficient, we are creating a batch of async upload tasks and executing them all using await asyncio.gather(*tasks)
.
After some time, we encountered a very high memory consumption of the container running this app, which constantly increases.
I tried to investigate what is using all the memory and it seems to me that every execution of SDK’s blob_client.upload_blob
adds few MBs to the memory, without releasing it.
To Reproduce Steps to reproduce the behavior: I was able to reproduce the issue with the following snippet
async def upload(storage_path):
async with ContainerClient.from_connection_string(conn_str=get_connection_string(), container_name=CONTAINER_NAME) as container_client:
blob_client = container_client.get_blob_client(blob=storage_path)
with open(TEST_FILE, 'rb') as file_to_upload:
await blob_client.upload_blob(file_to_upload, length=os.path.getsize(TEST_FILE), overwrite=True)
await blob_client.close()
@profile
async def run_multi_upload(n):
tasks = []
for i in range(n):
tasks.append(upload(f"storage_client_memory/test_file_{i}"))
await asyncio.gather(*tasks)
if __name__ == '__main__':
asyncio.run(run_multi_upload(100))
Expected behavior I was expecting to have a normal memory consumption since I’m not actively loading anything unusual to memory.
Screenshots
I used the memory_profiler
package to check the reason for the high memory consumption, this is its output for running the above snippet:
for a single async file upload we can see that the blob_client.upload_blob
adds few MB to memory -
Line # Mem usage Increment Occurrences Line Contents
=============================================================
18 114.3 MiB 104.3 MiB 76 @profile
19 async def upload(storage_path):
20 114.3 MiB 3.6 MiB 76 async with ContainerClient.from_connection_string(conn_str=get_connection_string(), container_name=CONTAINER_NAME) as container_client:
21 114.3 MiB 3.8 MiB 76 blob_client = container_client.get_blob_client(blob=storage_path)
22 114.3 MiB 0.0 MiB 76 with open(TEST_FILE, 'rb') as file_to_upload:
23 125.6 MiB 13.8 MiB 474 await blob_client.upload_blob(file_to_upload, length=os.path.getsize(TEST_FILE), overwrite=True)
24 125.6 MiB 0.0 MiB 172 await blob_client.close()
And in total, the await asyncio.gather(*tasks)
adds 24.9 MiB -
Line # Mem usage Increment Occurrences Line Contents
=============================================================
25 99.8 MiB 99.8 MiB 1 @profile
26 async def run_multi_upload(n):
27 99.8 MiB 0.0 MiB 1 tasks = []
28 99.8 MiB 0.0 MiB 101 for i in range(n):
29 99.8 MiB 0.0 MiB 100 tasks.append(upload(f"storage_client_memory/test_file_{i}"))
30 124.7 MiB 24.9 MiB 2 await asyncio.gather(*tasks)
Additional context My app is running in a Kubernetes cluster as a side-car container and constantly uploads files from the cluster to our storage. I’m running the uploads in batches of 20 async tasks. After 30 minutes in which I uploaded ~30,000 files, the container reached 1.5Gb of memory consumption. This was as part of a stress test for the container.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Hi @morpel, creating a single
ContainerClient
instance on startup and using it for all your requests should be find and would be the recommended approach. This should work across threads as well.Hi @morpel - Thanks for the detailed report! We’ll investigate asap!