question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SIP-85] OAuth2 for databases

See original GitHub issue

[SIP-85] OAuth2 for databases

Motivation

Modern cloud databases (eg, Sowflake, BigQuery, and Google Sheets) support OAuth2 for user authentication and authorization. Connecting Superset to these databases today, specially Google Sheets, is a complicated multi-step process, requiring uses to create a service account that can potentially give access to all the sheets inside an organization (if “impersonate user” is not enabled).

Superset could offer a much better experience to users by supporting OAuth2 directly. Users would simply have to grant the necessary permissions to Superset, and Superset would store and use the access tokens when running queries. This would remove the complexity of setting up these databases, and solve the problem of risking giving access to confidential tables and/or sheets.

Proposed Change

We propose supporting OAuth2 directly in Superset. When accessing a resource (table or Google sheet) that requires OAuth2 the user would be redirected to the authorization server, where they can grant the necessary permissions. After OAuth2 the access and refresh tokens would be stored in table with grain of (database, user), and reused in subsequent connections.

The following video shows the process in action:

https://user-images.githubusercontent.com/1534870/172488708-06f99cf3-499f-4323-91cd-fa426259e06c.mov

New or Changed Public Interfaces

This proposal adds a new table and a new endpoint. It also modifies modify_url_for_impersonation to pass a new parameter, the OAuth2 access_token. The SIP will be first implemented for Google Sheets, but the implementation is database agnostic, with all database specific logic living in the engine spec.

To store the user tokens a new table database_user_oauth2_tokens would be created, with a corresponding SQLAlchemy model:

user_id database_id access_token access_token_expiration refresh_token
1 1 XXX 2022-06-06 09:00:00Z YYY

(The access token and refresh token are stored in encrypted columns, similar to other database credentials.)

Here the user with ID 1 has an access token XXX on the database with ID 1. When they run a query in that database Superset will pass the access token to the database engine spec via modify_url_for_impersonation. For Google Sheets to use this token it’s as simple as:

class GSheetsEngineSpec(SqliteEngineSpec):
    ...
    @classmethod
    def modify_url_for_impersonation(
        cls,
        url: URL,
        impersonate_user: bool,
        username: Optional[str],
        access_token: Optional[str] = None,  # added in this SIP
    ) -> None:
        if impersonate_user and access_token:
            url.query["access_token"] = access_token

Different DB engine specs might have to handle the token in a different way.

When passing the token to the DB engine spec Superset will check if it hasn’t expired. If it has expired and a refresh token is available Superset will refresh the token and store the new one by calling a DB engine spec method. For Google Sheets:

class GSheetsEngineSpec(SqliteEngineSpec):
    ...
    @staticmethod
    def get_oauth2_fresh_token(refresh_token: str) -> OAuth2TokenResponse:
        response = http.request(  # type: ignore
            "POST",
            "https://oauth2.googleapis.com/token",
            fields={
                "client_id": current_app.config["GSHEETS_OAUTH2_CLIENT_ID"],
                "client_secret": current_app.config["GSHEETS_OAUTH2_CLIENT_SECRET"],
                "refresh_token": refresh_token,
                "grant_type": "refresh_token",
            },
        )
        return json.loads(response.data.decode("utf-8"))

The refresh token logic is part of the OAuth2 spec, but the details of how to do it are left to each DB engine spec. For Google Sheets we need to post a payload to https://oauth2.googleapis.com/token, as the example above shows.

How are these tokens initially obtained and stored? Each DB engine spec should declare if it supports OAuth2 via a is_oauth2_enabled method, for example:

from shillelagh.exception import UnauthenticatedError

class GSheetsEngineSpec(SqliteEngineSpec):
    ...
    @staticmethod
    def is_oauth2_enabled() -> bool:
        return (
            "GSHEETS_OAUTH2_CLIENT_ID" in current_app.config
            and "GSHEETS_OAUTH2_CLIENT_SECRET" in current_app.config
        )

    oauth2_exception = UnauthenticatedError

(The default method in the base class returns False.)

Each DB engine spec also defines an exception oauth2_exception that is raised when OAuth2 is needed (UnauthenticatedError in the example above). Superset will capture this exception and start the OAuth2 flow by raising a custom error that the frontend understands:

class BaseEngineSpec:
    ...
    def execute(
        cls,
        cursor: Any,
        query: str,
        database_id: int,
        **kwargs: Any,
    ) -> None:
        try:
            cursor.execute(query)
        except cls.oauth2_exception as ex:
            if cls.is_oauth2_enabled():
                oauth_url = cls.get_oauth2_authorization_uri(database_id)
                raise OAuth2RedirectError(oauth_url) from ex

Here the base engine spec is calling the method get_oauth2_authorization_uri that is database specific — this returns the URL of the authorization server.

When authenticating with OAuth2 it’s common to pass a state parameter to the authorization server to, as the name suggests, maintain state. That parameter is returned unmodified from the authorization server, and we can use it to verify that the request is valid and came from Superset. In this implementation, our state is a JWT signed with current_app.config["SECRET_KEY"], with the following payload:

{"user_id" 1, "database_id" 1}

The user should be redirected back from the authorization service to a new endpoint, api/v1/databases/oauth2/. That endpoint will receive the state JWT and a code query argument. The backend validates the JWT and extracts the user and database IDs. It then exchanges the code for the access and refresh tokens. Since this is database specific we call a method on the DB engine spec; for Google Sheets:

class GSheetsEngineSpec(SqliteEngineSpec):
    ...
    @staticmethod
    def get_oauth2_token(code: str) -> OAuth2TokenResponse:
        redirect_uri = current_app.config.get(
            "GSHEETS_OAUTH2_REDIRECT_URI",
            url_for("DatabaseRestApi.oauth2", _external=True),
        )

        response = http.request(  # type: ignore
            "POST",
            "https://oauth2.googleapis.com/token",
            fields={
                "code": code,
                "client_id": current_app.config["GSHEETS_OAUTH2_CLIENT_ID"],
                "client_secret": current_app.config["GSHEETS_OAUTH2_CLIENT_SECRET"],
                "redirect_uri": redirect_uri,
                "grant_type": "authorization_code",
            },
        )
        return json.loads(response.data.decode("utf-8"))

Note here that we have an optional GSHEETS_OAUTH2_REDIRECT_URI configuration key. This should not be used in most situations (but is needed at Preset, where we manage multiple Superset instances).

After exchanging the code for the access and refresh tokens they get saved to the table database_user_oauth2_tokens.

New dependencies

No new dependencies, though the reference implementation uses a few dependencies-of-dependencies not listed explicitly in Superset’s setup.py (urllib3 and yarl).

Migration Plan and Compatibility

A new migration is needed to create the database_user_oauth2_tokens table.

Rejected Alternatives

There are not many alternative ways to solve the problem of authenticating to a database using OAuth2. The table is needed, since we store personal tokens at the (database, user) grain. We need a new endpoint, to receive the code from the authorization server. Passing the access_token to modify_url_for_impersonation seems like the cleanest way of supporting OAuth2 without having to refactor all the engine specs.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:12
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Samira-Elcommented, Dec 5, 2022

Hi,

We’ll create a base implementation, and then people can add support on per-database basis. My idea is to start with GSheets and Snowflake, but it should be easy to add support for more databases.

Wondering if the implementation in progress is taking into consideration how having oauth for the datasources impacts scheduling alerts/reports and if anything is being done around this?

1reaction
michael-s-molinacommented, Jul 12, 2022

Currently, like shown in this SIP, we’re saving the tokens to the metadata db, in hindsight, I suggest adding the option to save them to the cache instead, since they expire anyway and to avoid multiple trips to the DB to fetch them throughout the period the user is querying the data source.

@betodealmeida If we choose to use the cache to store this information, our custom flask caching backend can store data in the key_value table in case a client does not support any of the available backends 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

Re: [VOTE] [DISCUSS] [SIP-85] OAuth2 for databases
I would like to call a vote for SIP-85, introducing OAuth2 > authentication for databases: > > https://github.com/apache/superset/issues/20300 > > Thank you ...
Read more >
Sip85 | Wines, Taps, Food in Palm Harbor, FL
Intimate, modern wine bar & restaurant offering New American small plates, steaks, chops & seafood.
Read more >
About | Sip85 in Palm Harbor, FL
About. Click on the button below to learn about our story and read our customer reviews. Our Story. Scroll Down to Content. a...
Read more >
Contact | Sip85 in Palm Harbor, FL
Send us a message and we'll get back to you as soon as possible. You can also reach us by phone at (727)...
Read more >
Reservations | Sip85 in Palm Harbor, FL
Reservations. Please reserve a table in the button below. For parties of 7 or more guests, please call us directly at 727-270-7570 to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found