Performance worsens greatly when downloading blobs with long breaks in between
See original GitHub issue- Package Name: azure-storage-blob
- Package Version: 12.6.0 (but it also happens on 12.13.1)
- Operating System: linux (but it doesn’t really matter, it happens on my mac too)
- Python Version: 3.8
Describe the bug When using the asyncio SDK, if we wait more than 10 seconds between blob downloads the download time becomes 10-100 times slower.
To Reproduce I wrote this script to demonstrate the issue:
import random
from asyncio import gather, sleep, run
from time import perf_counter
from azure.storage.blob.aio import BlobServiceClient, ContainerClient
all_times = []
async def download(client: ContainerClient, path: str):
s = perf_counter()
downloader = await client.download_blob(path)
await downloader.readall()
e = (perf_counter() - s) * 1000
all_times.append(e)
print(f"download took {e}ms file={path}")
def get_url_and_key():
### censored for security reasons ###
async def do():
account_url, account_key = get_url_and_key()
blob_client = BlobServiceClient(account_url, account_key)
container_client = blob_client.get_container_client('some-container')
file_pool = []
async for x in container_client.list_blobs():
if len(file_pool) > 50:
break
file_pool.append(x.name)
for i in range(30):
files = set()
while len(files) < 3:
files.add(file_pool[random.randint(0, len(file_pool) - 1)])
await gather(*[download(container_client, file) for file in files])
await sleep(20)
await container_client.close()
await blob_client.close()
print(f"average is {sum(all_times) / len(all_times)}ms")
run(do())
what this script essentially does:
- find the names of 50 blobs that exist in the container
- randomly choose 3 of them to download
- download them asynchronously (gather 3 tasks) and print the download times
- sleep a certain amount of seconds
- repeat (30 times) and print the average download time
if the sleep time is set to 10 seconds or below I get expectedly low download times:
download took 23.274756036698818ms file=+1b2+N3X7cgkoaajoUSlCA== download took 31.523440033197403ms file=+/K/vccgIKxIsFW6tk8l3Q== download took 44.199715834110975ms file=+/dVLONMzj3tMtql+Zv7Jg== download took 33.404441084712744ms file=+/UeV81yZRT+Eh5jeUKlOA== . . . download took 12.770875822752714ms file=+16GIyseiqsaO8T+mrXLQg== download took 9.113383013755083ms file=+1YF6MfP5Xb+sWEqgr36LQ== download took 7.825685199350119ms file=+/ZJM5EQ40nni2+zKjJA4A== download took 7.158686872571707ms file=+1b2+N3X7cgkoaajoUSlCA== average is 20.692046359181404ms
but if I set it to 20 seconds for example, the download times become insanely high (beyond the first batch):
download took 9.756594430655241ms file=+0huQr/6tn7zw5+p2fT/2A== download took 20.794587209820747ms file=+1b2+N3X7cgkoaajoUSlCA== download took 45.60287203639746ms file=+1h1V/YFgN130/9kzcIygQ== download took 885.5280890129507ms file=++nRJgH2HUF30/9kzcIygQ== download took 884.8122912459075ms file=+1YF6MfP5Xb+sWEqgr36LQ== download took 905.2660423330963ms file=+/soiSTB/5Wb0zHn28+O6Q== . . . download took 930.0287216901779ms file=+1ZAwh3CRGuuBvav8e/34w== download took 927.8017273172736ms file=+1QWh0iEdgVt/VX2PlPDpw== download took 940.912198740989ms file=+/FxOL5ySAQaO8T+mrXLQg== download took 639.4636919721961ms file=++O8gZmF/40FPYL5+VRhAA== download took 664.1455427743495ms file=+/soiSTB/5Wb0zHn28+O6Q== download took 669.6559321135283ms file=++zejjUIwmWuBvav8e/34w== average is 463.2759254503374ms
I’ve run this countless times with different sleep times on my own machine (macOS) and on the production machine (linux) at different traffic loads and with different storage accounts - and always got the same results, so we can rule out load issues on the azure blob storage, or file size variance.
It seems to me that after waiting a certain amount of time the connection with the azure blob storage dies and takes forever to revive. It seems like a bug to me, but if there’s a way to tweak that so the connection stays live for 3 minutes of inactivity for example (eliminating the need to revive it), it would solve the issue for now.
We’ve moved to the async SDK to be able to download many files asynchronously and reduce our response time, but it seems like the opposite has happened and it greatly increased it, to the point this SDK is unusable to us. Quick assistance will be greatly appreciated.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Hi @asafalon1, we have done some fairly extensive investigation into this issue, and we believe this is mainly being caused by TCP Connection re-use in the
aiohttp
layer.When a TCP Connection is first opened to the server, it takes additional time to establish that connection. Additionally, if configured, an existing TCP Connection can be kept open (alive) for a period of time where the connection will not need to be re-established and can be re-used. For
aiohttp
, the HTTP library behind our async library, the default value for this this idle timeout is 15 seconds. So, when we sleep for 20 seconds, the TCP connection that was open by the first set of requests is closed and it takes time to reopen this request on the next set of downloads. Additionally, my guess for why this affects all executing threads is probably because there would be one connection per thread in this case.What we don’t fully understand yet is why the additional download time is not fixed and seems to scale with the blob size. The time taken to establish a TCP connection should be fixed but it seems there is something additional happening in
aiohttp
causing this additional time to scale with the response size.The good news is that this idle timeout is configurable on the
aiohttp
side and we have found that increasing it beyond how long you are sleeping, seems to fix the issue. Currently the best way to configure this would be to provide and configure your ownClientSession
object to theBlobServiceClient
. This can be done like this:The one caveat here is that you are no longer using the default Session that
azure-core
provides for you which may mean missing out on some of our default configuration. Currently we don’t add much additional configuration beyond the defaults so it may not be a problem. You can see whereazure-core
constructs theClientSession
here so you may want to consider adding this additional configuration as well. We are currently thinking about ways to improve specifying this configuration.Hopefully that information helps you. Please let me know if you have any further questions. Thanks.
CC @xiangyan99 and @annatisch
Hi @asafalon1, since you haven’t asked that we “
/unresolve
” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve
” to reopen the issue.