Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for customize the max connections and max pools

See original GitHub issue

When Boto is used to perform multiple requests concurrently, for example to sped up the write throughput for DynamoDB, it uses the default parameters given by the HTTPAdapter [1], that is currently set to 10 by default.

This number means all of those amount of connections over this number will be dropped and they will not be recycled after, therefore when a new connection is necessary - and the first 10 are bussy - a full handshake TCP trip is performed, that will add a important latency for the whole time operation.

The idea will be add the properly fields to customize the current max connections and max pools, giving to the user the freedom to set this values to other ones different than the default ones.

Does it make sense for you ? if it does, I can try to send the properly pull request with a tentative solution.

[1] https://github.com/boto/botocore/blob/develop/botocore/vendored/requests/adapters.py#L82

Issue Analytics

State:
Created 8 years ago
Reactions:4
Comments:10 (3 by maintainers)

Top GitHub Comments

8reactions

wiltziuscommented, Sep 4, 2016

@jamesls your understanding of the issue is correct. urllib3 maintains a cache of idle TCP connections (the “pool”), but it’s set at a fixed size. If there are no connections available from the pool, it creates a new one. When it’s finished with that new connection it tries to return it to the pool, but if the pool is already full then it simply discards it. urllib3 issues a logging warning when it does so, since this essentially represents an inefficiency where the pool size is smaller than the number of simultaneous requests – future connections will perform the full TCP connection set up. This is not a correctness issue, it’s just an efficiency / performance issue.

Both urllib3 and the Python requests library that wraps it expose settings to change the pool size, but because the connections are created under the hood by boto we don’t have access to these settings from the boto client (without monkeypatching or similar).

Plumbing through the pool_connections setting that the requests HTTPAdapter exposes (as mentioned in the original post on this issue) would allow us to set the pool size to whatever is appropriate for our load. It should also be easy to do, it’s simply exposing a parameter and then passing that parameter value to the underlying requests library. A more general solution would be to expose overrides for all the default HTTPAdapter settings, but this is the only one I really care about.

I’ll note that although this issue is probably worst for services like DynamoDB in our case we trigger the error when loading more than 10 photos from S3 simultaneously, so hopefully whatever solution you arrive at is not specific to DynamoDB.

Lastly, simply for reference if you’re curious, here’s the place in the urllib3 connection pool code where the warning is issued:

https://github.com/shazow/urllib3/blob/65b8c52c16dee5c3a523de2c1c21853ba0e581f2/urllib3/connectionpool.py#L257

and here’s the docs for the connection pool:

https://urllib3.readthedocs.io/en/1.4/pools.html

and here’s a sort of how-to article on the setting in the requests library with way more information than you probably want on the subject:

https://laike9m.com/blog/requests-secret-pool_connections-and-pool_maxsize,89/

Thanks!

1reaction

jameslscommented, Sep 8, 2016

Thanks for the info. One last thing I noticed, while pool_connection is the number of conn pools to be cached in memory at any given point (1 conn pool per host), pool_maxsize is the total number of connections to keep in a single connection pool. pool_maxsize is what we’d need to accommodate the multithreaded scenario that’s been outlined here, but I’m not sure if pool_connection matters that much (it would matter if you’re accessing multiple AWS services). In requests, they use the same default value for both:


class HTTPAdapter(BaseAdapter):
    def __init__(self, pool_connections=DEFAULT_POOLSIZE,
                 pool_maxsize=DEFAULT_POOLSIZE, max_retries=DEFAULT_RETRIES,
                 pool_block=DEFAULT_POOLBLOCK):

I wonder if we should simplify and do something similar, expose a max_poolsize and just set that value for both pool_connection and pool_maxsize.