question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hazelcast, Flask Gunicorn with Eventlet hangs

See original GitHub issue

HI,

We are trying to use Hazelcast in Flask service. This service runs with gunicorn using eventlet workers. When used with this configuration, client never connects, when switching to sync worker, everything works fine. I pdbed into reactor and found that queue is patched with queue from eventlet.

I guess questions are:

  • Is there a way to fix this?

  • Are any gunicorn async workers supported with Hazelcast e.g. gevent?

  • Python=3.6

  • Hazelcast = 3.12.1

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mdumandagcommented, May 1, 2020

@alexjironkin, I am glad that you are able the make it work.

Since this part of the client is in the hot path, I don’t think we can put a sleep there as a permanent fix for now. I don’t know how, but I guess it will result in some slowdown. I recommended it because in your use case, it was the only feasible way.

We are working on the 4.0 release of the client now. In this release, we will be introducing some breaking changes. Maybe we can spend some time before the release to find a way that is both performant and compatible with the frameworks like eventlet. So, I am going to keep this issue open for now. If you have ideas about it, please let us know.

1reaction
mdumandagcommented, Apr 24, 2020

Hi @alexjironkin After reading about the execution model of the eventlet (and of the gevent since they are similar in this sense) and debugging using a sample application, I found the problem. Let me try to describe it.

Gunicorn monkey patches the thread and threading module (see https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/geventlet.py#L124, by default eventlet monkey patches certain system modules including thread and threading https://eventlet.net/doc/basic_usage.html#patching-functions). What that means is that, when you start a new thread in a monkey patched application, it will not run as a standard thread. Instead, it will run as eventlet coroutine. (See the image at this link https://eventlet.net/doc/threading.html). The thing that causes the problem is switching between these coroutines.

If they were standard threads, Python would perform context switches in between them even if they were performing some blocking work. That has some overhead, but it requires no co-operation between threads.

Eventlet uses a different approach. It depends on the principle of the co-operation. Meaning, it requires coroutines to yield when they are going to block, so that the other coroutines can still be executed.

The problem arises from the fact that, our reactor module does not perform any form of yielding in its loop function. So, when the reactor thread is started as a coroutine due to monkey patching, no switch between coroutines happens, and the application become unresponsive, performing the instructions inside the loop function all the time.

So, the possible solutions are

  • Find a way to unpatch the threading module. I tried to find the ways to do so, but I couldn’t succeed. Event if we can find a way, I am not sure how that would play with Eventlet itself.
  • Perform cooperative switch at the loop function I linked above. To do so, we need to add something like time.sleep(0) before the following line. https://github.com/hazelcast/hazelcast-python-client/blob/master/hazelcast/reactor.py#L45 . time.sleep will be monkey patched by the Eventlet and will perform the switch between coroutines. So, we need to monkey patch the loop function. It feels dirty to do so, but I think this is the safest way 😃

So, a code like this needs to executed only once before you start any Hazelcast clients.

import asyncore
import hazelcast
import select
import time

from hazelcast.future import Future
from hazelcast.reactor import AsyncoreReactor

def patched_loop(self):
    self.logger.debug("Starting Reactor Thread", extra=self._logger_extras)
    Future._threading_locals.is_reactor_thread = True
    while self._is_live:
        try:
            time.sleep(0)
            asyncore.loop(count=1, timeout=0.01, map=self._map)
            self._check_timers()
        except select.error:
            self.logger.warning("Connection closed by server", extra=self._logger_extras)
            pass
        except:
            self.logger.exception("Error in Reactor Thread", extra=self._logger_extras)
            return
    self.logger.debug("Reactor Thread exited. %s" % self._timers.qsize(), extra=self._logger_extras)
    self._cleanup_all_timers()


AsyncoreReactor._loop = patched_loop

Hope that helps

Read more comments on GitHub >

github_iconTop Results From Across the Web

Gunicorn+flask+pymongo+gevent hangs on initialization
it works perfectly if gevent is turned off (monkey patching and gunicorn worker class) · it works if db object is created per...
Read more >
towards_left duff_s mdbg holt_winters gai incl_tax drupal_fapi ...
hanging getcolors gitignored localized pk_violation slideout_menu ifpresent power_tools markermanager couldn_t_expand_remoteviews sprague.
Read more >
[fedora-arm] arm rawhide report: 20150721 changes
... Flask, GTK3) New package: devassistant-dap-ruby-0.11-2.fc23 Ruby ... JNI code New package: hazelcast-3.2.2-3.fc23 Hazelcast CE In-Memory ...
Read more >
Awesome Flask Awesome - DirDev.com
A curated list of awesome things related to Flask. Flask is a lightweight WSGI web application framework written in Python. Contents. Third-Party Extensions ......
Read more >
Simple Index
ac-flask-hipchat · academical-api-client ... django-gunicorn · django-gusregon · django-guts ... hazelcast-python-client
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found