Unexpected invalid_grant error on offline session refresh when maximum number of offline sessions is configured
See original GitHub issueDescribe the bug
Some of our customers use many offline sessions, leading to very high memory consumption and startup time. In order to reduce the startup time, we use the lazy offline-session loading implemented in context of https://github.com/keycloak/keycloak/pull/8033.
In order to limit the memory consumption, we wanted to limit the offline-session caches (both “offlineSessions” and “offlineClientSessions”) to 10_000.
But with this configuration, we got several unexpected errors when creating new sessions with offline tokens (i.e. refreshing offline tokens). The error response was:
{
"error": "invalid_grant",
"error_description": "Session doesn't have required client"
}
It turned out that this was caused because the current Keycloak code assumes that in case an offline user session is in the cache, all corresponding offline client sessions are also in the cache. But this is not always the case, especially when there are multiple offline client sessions per offline user session. The latter condition applies when users log in at multiple clients within the same browser session.
Version
15.0.2, 17.0.0-SNAPSHOT
Expected behavior
When refreshing a valid offline token, always a new online token (with corresponding Keycloak offline user session and offline client session) should be created.
Actual behavior
When refreshing some valid offline tokens, sometimes the following error is returned:
{
"error": "invalid_grant",
"error_description": "Session doesn't have required client"
}
How to Reproduce?
I could reproduce the issue in the following way:
Use a standalone wildfly-based keycloak installation.
Disable offline session preloading with the following CLI commands:
/subsystem=keycloak-server/spi=userSessions:add()
/subsystem=keycloak-server/spi=userSessions:write-attribute(name=default-provider,value=infinispan)
/subsystem=keycloak-server/spi=userSessions/provider=infinispan:write-attribute(name=properties.preloadOfflineSessionsFromDatabase,value=false)
Limit both offline session caches to maximum 2 entries:
/subsystem=infinispan/cache-container=keycloak/local-cache=offlineSessions/memory=heap:add(size=2)
/subsystem=infinispan/cache-container=keycloak/local-cache=offlineClientSessions/memory=heap:add(size=2)
Create a test realm.
In the test realm, create two clients client1 and client2 with “Standard Flow enabled”. Create three users (test1, test2, test3) with password credentials.
Create the following offline tokens. Make sure to save all of them for use in subsequent steps:
- Create offline token “client1 - test1”: Create the offline token with the authorization code flow with scope offline_access and client_id=client1. I used the Postman OAuth2.0 authorization feature for this purpose (https://learning.postman.com/docs/sending-requests/authorization/#oauth-20). Login with user test1. When you’ve got the refresh token, make sure you close or revoke the created (online) browser session.
- Create offline token “client1 - test2”: Proceed as in the previous step, but using test2.
- Create offline token “client1 - test3”: Proceed as in the previous step, but using test3 and NOT closing or revoking the created (online) browser session.
- Create offline token “client2 - test3”: Change the client_id to client2 and reuse the browser session from the previous step. The token will be created without re-login.
These calls result in 3 offline user sessions and 4 client sessions created: test1 session with client1 session test2 session with client1 session test3 session with client1 and client2 sessions
To make sure you can reproduce the following steps, it’s best to restart Keycloak now.
In the next step, try to create for each of the offline tokens a new access token, for example with curl:
curl --request POST '<kcBaseUrl>/auth/realms/test/protocol/openid-connect/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=refresh_token' \
--data-urlencode 'client_id=<clientId>' \
--data-urlencode 'refresh_token=<refreshToken>'
- OK - cache is empty, both test1 session and its client1 session are loaded into the cache. Nothing is evicted, in both caches there is still space for one entry.
- OK - test2 session and its client1 session are loaded into the cache. Nothing is evicted, but there is no more space in both caches.
- OK - test3 session and its client1 session are loaded into the cache. Test1 session and its client1 session are evicted.
- Failure - test3 session is still in the cache, but its client2 session is not. Because Keycloak currently does not load a client session from the persistence when not found in the cache, the following error is returned:
{
"error": "invalid_grant",
"error_description": "Session doesn't have required client"
}
Anything else?
Proposed solution
As I have understood @sschu, @stianst proposed to change the relation offline user sessions to offline client sessions to 1:1 (online sessions the relation should stay 1:N). This fixes the scenario described above, but there could still be the “invalid_grant” error in case of many parallel requests - or just with “legacy” offline sessions (offline user sessions with multiple client sessions). That’s why I think it makes sense to load the client session(s) from the persistence and update the cache, in case an offline client session cannot be found.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
I was somehow able to reproduce the error with
20.0.0
. I agree the issue should be addressed. Thomas summarized impact of the issue nicely here https://github.com/keycloak/keycloak/pull/8671#issuecomment-1145391371. However, I’m leaning towards smallest possible fix which is “making a DB fallback inInfinispanUserSessionProvider.getClientSession()
” Something like this PR https://github.com/keycloak/keycloak/pull/8671. I wouldn’t change the 1:n relation between offline user session and offline client sessions due to complexity, current state of a legacy store (new store will eliminate this issue), and the fact it won’t eliminate the issue itself without the DB fallback.I’ll schedule this issue for one of the following sprints. The goal would be probably combine and simplify two existing PRs and maybe provide a less complex test in
model
testsuite that will test the DB fallback.@martin-kanis Sounds good for me, I agree at this time it probably doesn’t make too much sense to invest into the old store. Is there any way we can support here?