Generic DBT_ACCESS_TOKEN in dbt Workflow does not have access to catalogs
See original GitHub issueContext
- AWS Cloud
- Enterprise Plan
- We are using Unity Catalog
- An authenticated user without a permission group has access to no catalogs
- Only authenticated users with a permission group can access respective catalogs and databases
- I am a consultant engaged with a client
- This client has regulatory requirements that prevent the feasibility of dbt cloud
Triage Steps
Following the instructions outlined for running dbt-core
as a dbt Task
Workflow here:
https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-workflows.md
✅ Local Dev + SQL Warehouses works
I can get this to work locally when I specify the http_path
to be a SQL Warehouse endpoint and I use my PAT injected into the profiles.yml
This works because my PAT and the SQL Warehouses are assigned to the correct permissions groups in our workspace
😭 dbt Workflows does not work when…
This is because the cluster generates credentials generically for:
DBT_HOST
,DBT_ORG_ID
,DBT_CLUSTER_ID
,DBT_ACCESS_TOKEN
It seems that access token does not have the correct permission group associated and there is no way to add the association.
I tried hard coding a PAT that I know works but instead of using the /sql/1.0/endpoints/<endpoint_id>
I tried it against the cluster endpoint "sql/protocolv1/o/{{ env_var('DBT_ORG_ID') }}/{{ env_var('DBT_CLUSTER_ID') }}"
.
This also did not work.
I could go back to having the workflow use the SQL Warehouse Endpoint and a PAT but that defeats the purpose of spinning up a spark cluster for a job only to send the work to another (more expensive) spark cluster right?
Triage Summary
Here is a summary table of test cases
token type | endpoint type | outcome |
---|---|---|
PAT | SQL Warehouse | ✅ |
Generic Access Token | Cluster Endpoint | 😭 |
PAT | Cluster Endpoint | 😭 |
Definitions:
- Generic Access Token -->
DBT_ACCESS_TOKEN
- SQL Warehouse -->
/sql/1.0/endpoints/<endpoint_id>
- Cluster Endpoint -->
"sql/protocolv1/o/{{ env_var('DBT_ORG_ID') }}/{{ env_var('DBT_CLUSTER_ID') }}"
Further digging:
It seems the dbt-databricks
team have left a TODO
with a similar conundrum so we are all on the same page it seems.
# TODO: what is the error when a user specifies a catalog they don't have access to
conn: DatabricksSQLConnection = dbsql.connect(
server_hostname=creds.host,
Expected Behaviour
- As a user I would expect documentation around how the generic
DBT_ACCESS_TOKEN
is generated. - As a user I would expect the dbt Workflow task to have some way to grant it permissions
- As a user I would expect some documentation around the cluster SQL endpoints as well as a bread crumb trail of links from the related user guides and dbt task documentation
Extra
I do realise that my expectations lie with the Databricks team and not with this library package. However it also makes sense to track this issue in public so that others can follow along as it will undoubtedly link to some code changes too.
I have an active use case I am piloting with a client so very keen to act as a beta tester in our dev environment so there is low risk and we can provide real world feedback.
Issue Analytics
- State:
- Created a year ago
- Reactions:3
- Comments:15 (7 by maintainers)
Top GitHub Comments
You’ve set the standard for bug reports. Thanks @neozenith !!
Thanks for the awesome bug report @neozenith! The Workflows UC support is being looked into.