question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[cheroot==8.1.0 regression] Occasional connection resets with concurrent requests

See original GitHub issue

I’m submitting a …

  • bug report
  • feature request
  • question about the decisions made in the repository

Do you want to request a feature or report a bug?

Reporting a bug

What is the current behavior?

When running concurrent requests against a cherrypy the client occasionally gets ‘Connection reset by peer’ errors.

If the current behavior is a bug, please provide the steps to reproduce and if possible a screenshots and logs of the problem. If you can, show us your code.

This can be reproduced by running the following server and client code. Tested both on MacOS with python 3.7 and Ubuntu bionic on python 3.6. Both reveal the same connection errors:

ERROR:test:FAILED http://localhost:5000/info: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
(...)
ERROR:test:FAILED http://localhost:5000/info: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
ERROR:test:FAILED http://localhost:5000/info: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
INFO:test:921 out of 1000 successful

When reproducing locally the issue reproduces already with 20 workers occasionally and with 50 workers on every run of the client script. I also played with various values of threadpool and queue size settings but couldn’t get it working.

Server code:

import logging
import cherrypy

logging.basicConfig(level=logging.DEBUG)


class TestAPI:

    @cherrypy.expose
    def info(self):
        return 'test\n'


def main():
    route_map = cherrypy.dispatch.RoutesDispatcher()
    api = TestAPI()
    route_map.connect('info', '/info', controller=api, action='info')

    conf = {
        '/': {
            'request.dispatch': route_map
        }
    }

    cherrypy.tree.mount(api, '/', config=conf)

    cherrypy.config.update({
        'global': {
            'environment': 'production',
            'server.socket_host': '0.0.0.0',
            'server.socket_port': 5000,
            # 'server.thread_pool': 20,
            # 'server.thread_pool_max': 20,
            # 'server.socket_queue_size': 128,
            # 'server.accepted_queue_size': 100000,
        },
    })

    cherrypy.engine.start()
    cherrypy.engine.block()


if __name__ == '__main__':
    main()

Client code:

from concurrent.futures.thread import ThreadPoolExecutor
from urllib.parse import urljoin

import requests
import logging

baseurl = 'http://localhost:5000/'

log = logging.getLogger('test')


def configure_logging(context):
    logger = logging.getLogger(context)
    logger.setLevel(logging.DEBUG)


def request_info(session):
    url = urljoin(baseurl, f'info')
    try:
        r = session.get(url)
    except requests.exceptions.ConnectionError as e:
        log.error('FAILED %s: %s', url, e)
        return False
    r.raise_for_status()
    return True


def main():
    session = requests.session()
    a = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1000)
    session.mount('http://', a)

    count = 1000
    success = 0
    with ThreadPoolExecutor(max_workers=50) as tp:
        tasks = []
        for i in range(count):
            tasks.append(tp.submit(request_info, session))

        for task in tasks:
            if task.result():
                success += 1

    log.info('%s out of %s successful', success, count)


if __name__ == "__main__":
    logging.basicConfig()
    configure_logging('test')
    main()

We see the same behavior when running a production service between apache as a reverse proxy.

What is the expected behavior?

There are no ‘Connection reset by peer’ errors.

What is the motivation / use case for changing the behavior?

Please tell us about your environment:

  • Cheroot version: 8.2.1
  • CherryPy version: 18.5.0
  • Python version: 3.6.9 (Ubuntu bionic) and python 3.7 (MacOS)
  • OS: MacOS and Ubuntu bionic
  • Browser: requests library and apache reverse proxy

Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.)

Note that when running with Tornado as HTTP server the problem doesn’t reproduce:

import logging
import cherrypy

logging.basicConfig(level=logging.DEBUG)


class TestAPI:

    @cherrypy.expose
    def info(self):
        return 'test\n'


def main():
    import tornado
    import tornado.httpserver
    import tornado.wsgi

    route_map = cherrypy.dispatch.RoutesDispatcher()
    api = TestAPI()
    route_map.connect('info', '/info', controller=api, action='info')

    conf = {
        '/': {
            'request.dispatch': route_map
        }
    }

    wsgiapp = cherrypy.tree.mount(api, '/', config=conf)

    # Disable the autoreload which won't play well
    cherrypy.config.update({'engine.autoreload.on': False})

    # let's not start the CherryPy HTTP server
    cherrypy.server.unsubscribe()

    # use CherryPy's signal handling
    cherrypy.engine.signals.subscribe()

    # Prevent CherryPy logs to be propagated
    # to the Tornado logger
    cherrypy.log.error_log.propagate = False

    # Run the engine but don't block on it
    cherrypy.engine.start()

    # Run thr tornado stack
    container = tornado.wsgi.WSGIContainer(wsgiapp)
    http_server = tornado.httpserver.HTTPServer(container)
    http_server.listen(5000)

    # Publish to the CherryPy engine as if
    # we were using its mainloop
    tornado.ioloop.PeriodicCallback(
        lambda: cherrypy.engine.publish('main'), 100).start()
    tornado.ioloop.IOLoop.instance().start()


if __name__ == '__main__':
    main()

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
the-allanccommented, Apr 6, 2020

I know what the cause of this is and I’ve got an in-progress fix for it. I’ll post an update shortly.

1reaction
tobiashenkelcommented, Feb 3, 2020

Thanks, I re-tested and checked the mentioned settings: net.core.somaxconn is by default 128 net.ipv4.tcp_abort_on_overflow is by default 0 as well

Re-tested with somaxconn and socket_queue_size of 128 and 1024 on ubuntu with no change on the observed behavior.

The Ubuntu test has been done on a freshly created VM in aws.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Client sees connection resets and connection failures ... - GitHub
On 200 concurrent requests, some fail after 20-60 seconds with either "curl: (7) Failed to connect to localhost port 8080: Operation timed ...
Read more >
Go http.Get, concurrency, and "Connection reset by peer"
The message connection reset by peer indicates that the remote server sent an RST to forcefully close the connection, either deliberately as ...
Read more >
kube-proxy Subtleties: Debugging an Intermittent Connection ...
I recently came across a bug that causes intermittent connection resets. After some digging, I found it was caused by a subtle combination ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found