Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Password Hash Cache not working as expected

See original GitHub issue

Hi there. I seem to be in the minority, but the password caching mechanism does not work out of the box for me (#91 seems to imply that it will work for some folks). I’m just using flask’s built in dev server flask run

The main issue seems to be obvious: The cache is being stored in a werkzeug Local(), which (for me) gets wiped after every request. Thus the cache is working, in a sense, but it gets cleared after every request. This feels correct because the Locals() interface acts as a guard between accidentally sharing data across threads, where what I want is actually a shared cache between all threads. I want all my requests to be fast (and share their cache between all requests).

I fixed this by simply replacing the Locals() call with a globally instantiated call to the TTLCache (see below). I’ll probably end up throwing this into Redis (so that I can share caches across like 20 workers).

I’ve found the password verification to be a substantial bottleneck in my app where polling requests take at a minimum 700ms just to verify the hash (ugh) on every single auth-token request, which really starts to bog everything down if you’re trying to stack requests or poll relatively quickly (like every couple seconds). If you’ve got a handful of requests across a handful of clients, forget about it. Thus I’m very interested in this feature, and a clean implementation that works for me.

So… am I doing something different from everyone else or do I have different requirements?

Before:

local_cache = Local()

# ...
def _request_loader(request):
    # ...
    use_cache = cv("USE_VERIFY_PASSWORD_CACHE")

    if not user:
        return _security.login_manager.anonymous_user()
    if use_cache:
        cache = getattr(local_cache, "verify_hash_cache", None)
        if cache is None: # its ALWAYS None!
            cache = VerifyHashCache()
            local_cache.verify_hash_cache = cache
        if cache.has_verify_hash_cache(user):
            return user
        if verify_hash(data[1], user.password):
            cache.set_cache(user)
            return user
    else:
        if verify_hash(data[1], user.password):
            return user

After:

#local_cache = Local()
verifyCache = None

# ...
def _request_loader(request):
   # ...

    use_cache = cv("USE_VERIFY_PASSWORD_CACHE")

    if not user:
        return _security.login_manager.anonymous_user()
    if use_cache:
        global verifyCache
        if (verifyCache is None):
            verifyCache = VerifyHashCache()
        if verifyCache.has_verify_hash_cache(user):
            return user
        if verify_hash(data[1], user.password):
            verifyCache.set_cache(user)
            return user
    else:
        if verify_hash(data[1], user.password):
            return user

    return _security.login_manager.anonymous_user()

Interestingly, this only partially fails on my production setup with gunicorn. But it only works if I refresh the page (and thus do couple of requests back to back). I assume that this is because gunicorn will maybe re-use the thread for requests from the same client within a window (or maybe Chrome is pipelining the http requests over a single socket). However, if I just let it poll every few seconds it consistently misses the cache (presumably because gunicorn is resetting the thread).

I’m curious what setup everyone else is using where this “just works” for them?

Issue Analytics

State:
Created 4 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

jwag956commented, Jan 17, 2020

We now have robust and fast token generation/validation in the system (via the new user field fs_uniquifier). The current cache will be removed in 4.0.

Also - marking a user as inactive now ensures that no token or session will be authenticated.

0reactions

jwag956commented, Jul 20, 2019

Thanks for getting back. First - I have a WIP for overhauling CSRF support - almost done and I think that should solve/make easy lots of issues folks have had.

Second - the more I think about caching - I think this is the wrong approach. I merged in the cache PR as a stop-gap measure. The only reason it is slow is that it is attempting to cause the token to become inactive if the user changes their password. I haven’t researched why they did that - but I think that is the complete wrong direction. Tokens are for APIs used by scripts and other service to service operations. They should have nothing to do with a users password. Right now, I believe that there is no way to disable a token (even marking a user as ‘inactive’ doesn’t do it if I am reading code directly. (Just to be clear - the expensive operation that is happening is bcrypt() in order to compare the data in the token to the in-DB password - to see if they match. This has nothing to do with whether the token is valid - that is handled via itdangerous (seeing if the token was signed correctly)

API tokens should be independent from username/password and should be revokable. This is how oauth2 works w.r.t. client credentials. I believe this is a much more scalable and secure model. I have been looking at incorporating oauth2 provider (at least the client credential part) - and completely remove the current authtoken implementation

Since this is such a big issue - there are some stop-gaps that would be quick and easy - one would be to stop checking password match on each request. That would make the token work as long as its TTL. If we fix the ‘inactive’ to mean no longer access system - that might be enough for a start.

thoughts?