Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[cheroot==8.1.0 regression] Occasional connection resets with concurrent requests

See original GitHub issue

I’m submitting a …

bug report
feature request
question about the decisions made in the repository

Do you want to request a feature or report a bug?

Reporting a bug

What is the current behavior?

When running concurrent requests against a cherrypy the client occasionally gets ‘Connection reset by peer’ errors.

If the current behavior is a bug, please provide the steps to reproduce and if possible a screenshots and logs of the problem. If you can, show us your code.

This can be reproduced by running the following server and client code. Tested both on MacOS with python 3.7 and Ubuntu bionic on python 3.6. Both reveal the same connection errors:

ERROR:test:FAILED http://localhost:5000/info: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
(...)
ERROR:test:FAILED http://localhost:5000/info: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
ERROR:test:FAILED http://localhost:5000/info: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
INFO:test:921 out of 1000 successful

When reproducing locally the issue reproduces already with 20 workers occasionally and with 50 workers on every run of the client script. I also played with various values of threadpool and queue size settings but couldn’t get it working.

Server code:

import logging
import cherrypy

logging.basicConfig(level=logging.DEBUG)


class TestAPI:

    @cherrypy.expose
    def info(self):
        return 'test\n'


def main():
    route_map = cherrypy.dispatch.RoutesDispatcher()
    api = TestAPI()
    route_map.connect('info', '/info', controller=api, action='info')

    conf = {
        '/': {
            'request.dispatch': route_map
        }
    }

    cherrypy.tree.mount(api, '/', config=conf)

    cherrypy.config.update({
        'global': {
            'environment': 'production',
            'server.socket_host': '0.0.0.0',
            'server.socket_port': 5000,
            # 'server.thread_pool': 20,
            # 'server.thread_pool_max': 20,
            # 'server.socket_queue_size': 128,
            # 'server.accepted_queue_size': 100000,
        },
    })

    cherrypy.engine.start()
    cherrypy.engine.block()


if __name__ == '__main__':
    main()

Client code:

from concurrent.futures.thread import ThreadPoolExecutor
from urllib.parse import urljoin

import requests
import logging

baseurl = 'http://localhost:5000/'

log = logging.getLogger('test')


def configure_logging(context):
    logger = logging.getLogger(context)
    logger.setLevel(logging.DEBUG)


def request_info(session):
    url = urljoin(baseurl, f'info')
    try:
        r = session.get(url)
    except requests.exceptions.ConnectionError as e:
        log.error('FAILED %s: %s', url, e)
        return False
    r.raise_for_status()
    return True


def main():
    session = requests.session()
    a = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1000)
    session.mount('http://', a)

    count = 1000
    success = 0
    with ThreadPoolExecutor(max_workers=50) as tp:
        tasks = []
        for i in range(count):
            tasks.append(tp.submit(request_info, session))

        for task in tasks:
            if task.result():
                success += 1

    log.info('%s out of %s successful', success, count)


if __name__ == "__main__":
    logging.basicConfig()
    configure_logging('test')
    main()

We see the same behavior when running a production service between apache as a reverse proxy.

What is the expected behavior?

There are no ‘Connection reset by peer’ errors.

What is the motivation / use case for changing the behavior?

Please tell us about your environment:

Cheroot version: 8.2.1
CherryPy version: 18.5.0
Python version: 3.6.9 (Ubuntu bionic) and python 3.7 (MacOS)
OS: MacOS and Ubuntu bionic
Browser: requests library and apache reverse proxy

Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.)

Note that when running with Tornado as HTTP server the problem doesn’t reproduce:

import logging
import cherrypy

logging.basicConfig(level=logging.DEBUG)


class TestAPI:

    @cherrypy.expose
    def info(self):
        return 'test\n'


def main():
    import tornado
    import tornado.httpserver
    import tornado.wsgi

    route_map = cherrypy.dispatch.RoutesDispatcher()
    api = TestAPI()
    route_map.connect('info', '/info', controller=api, action='info')

    conf = {
        '/': {
            'request.dispatch': route_map
        }
    }

    wsgiapp = cherrypy.tree.mount(api, '/', config=conf)

    # Disable the autoreload which won't play well
    cherrypy.config.update({'engine.autoreload.on': False})

    # let's not start the CherryPy HTTP server
    cherrypy.server.unsubscribe()

    # use CherryPy's signal handling
    cherrypy.engine.signals.subscribe()

    # Prevent CherryPy logs to be propagated
    # to the Tornado logger
    cherrypy.log.error_log.propagate = False

    # Run the engine but don't block on it
    cherrypy.engine.start()

    # Run thr tornado stack
    container = tornado.wsgi.WSGIContainer(wsgiapp)
    http_server = tornado.httpserver.HTTPServer(container)
    http_server.listen(5000)

    # Publish to the CherryPy engine as if
    # we were using its mainloop
    tornado.ioloop.PeriodicCallback(
        lambda: cherrypy.engine.publish('main'), 100).start()
    tornado.ioloop.IOLoop.instance().start()


if __name__ == '__main__':
    main()

Issue Analytics

State:
Created 4 years ago
Comments:12 (10 by maintainers)

Top GitHub Comments

1reaction

the-allanccommented, Apr 6, 2020

I know what the cause of this is and I’ve got an in-progress fix for it. I’ll post an update shortly.

1reaction

tobiashenkelcommented, Feb 3, 2020

Thanks, I re-tested and checked the mentioned settings: net.core.somaxconn is by default 128 net.ipv4.tcp_abort_on_overflow is by default 0 as well

Re-tested with somaxconn and socket_queue_size of 128 and 1024 on ubuntu with no change on the observed behavior.

The Ubuntu test has been done on a freshly created VM in aws.