Excessive memory usage on multithreading
See original GitHub issueI have been trying to debug a “memory leak” in my newly upgraded boto3 application. I am moving from the original boto 2.49.
My application starts a pool of 100 thread and every request is queued and redirected to one of these threads and usual memory for the lifetime of the appication was about 1GB with peaks of 1.5GB depending of the operation.
After the upgrade I added one boto3.Session
per thread and I access multiple resources and clients from this session which are reused throughout the code. On previous code I would have a boto connection of each kind per thread (I use several services like S3, DynamoDB, SES, SQS, Mturk, SimpleDB) so it is pretty much the same thing.
Except that each boto3.Session alone uses increases memory usage immensely and now my application is running on 3GB of memory instead.
How do I know it is the boto3 Session, you ask? I created 2 demo experiments with the same 100 threads and the only difference on both is using boto3 in one and not on the other.
Program 1: https://pastebin.com/Urkh3TDU Program 2: https://pastebin.com/eDWPcS8C (Same thing with 5 lines regarding boto commented out)
Output program 1 (each print happens 5 seconds after the last one):
Process Memory: 39.4 MB
Process Memory: 261.7 MB
Process Memory: 518.7 MB
Process Memory: 788.2 MB
Process Memory: 944.5 MB
Process Memory: 940.1 MB
Process Memory: 944.4 MB
Process Memory: 948.7 MB
Process Memory: 959.1 MB
Process Memory: 957.4 MB
Process Memory: 958.0 MB
Process Memory: 959.5 MB
Now with plain multiple threads and no AWS access. Output program 2 (each print happens 5 seconds after the last one):
Process Memory: 23.5 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Alone the boto3 session object is retaining 10MB per thread in a total of about 1GB. This is not acceptable from an object that should not be doing much more than requesting stuff to the AWS servers only. It means that the Session is keeping lots of unwanted information.
You could be wondering if it is not the resource that is keeping live memory. If you move the resource creation to inside the for loop, the program will also hit the 1GB in the exact the same 15 to 20 seconds of existence.
In the beginning I tried garbage collecting for cyclic references but it was futile. The decrease in memory was only a couple megabytes.
I’ve seen people complaining on botocore project on something similar (maybe not!), so it might be a shared issue. https://github.com/boto/botocore/issues/805
Issue Analytics
- State:
- Created 5 years ago
- Reactions:14
- Comments:31 (3 by maintainers)
Top GitHub Comments
confirm, just a simple creation of a
boto3.session
in threads/async handlers lead to extensive memory usage, that’s is not freed at all (gc.collect()
doesn’t help too)@cschloer @longbowrocks I created this issue 2 years ago and the situation is unchanged since. My solution at the time which is running today on hundreds of servers I have deployed is exactly that of a local cache that I add to the current thread object.
Below is the code I use (slightly edited) to replace the
resource
andclient
boto 3 functions that is thread safe and does not need to explicitly create sessions and your code doesn’t need to be aware it is inside a separate thread. You might need to do some cleanup to avoid open file warnings when terminating threads.There are limitations to this and I offer no guarantees. Use with caution.