question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generic DBT_ACCESS_TOKEN in dbt Workflow does not have access to catalogs

See original GitHub issue

Context

  • AWS Cloud
  • Enterprise Plan
  • We are using Unity Catalog
  • An authenticated user without a permission group has access to no catalogs
  • Only authenticated users with a permission group can access respective catalogs and databases
  • I am a consultant engaged with a client
  • This client has regulatory requirements that prevent the feasibility of dbt cloud

Triage Steps

Following the instructions outlined for running dbt-core as a dbt Task Workflow here: https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-workflows.md

✅ Local Dev + SQL Warehouses works

I can get this to work locally when I specify the http_path to be a SQL Warehouse endpoint and I use my PAT injected into the profiles.yml This works because my PAT and the SQL Warehouses are assigned to the correct permissions groups in our workspace

😭 dbt Workflows does not work when…

This is because the cluster generates credentials generically for:

  • DBT_HOST,
  • DBT_ORG_ID,
  • DBT_CLUSTER_ID,
  • DBT_ACCESS_TOKEN

It seems that access token does not have the correct permission group associated and there is no way to add the association.

I tried hard coding a PAT that I know works but instead of using the /sql/1.0/endpoints/<endpoint_id> I tried it against the cluster endpoint "sql/protocolv1/o/{{ env_var('DBT_ORG_ID') }}/{{ env_var('DBT_CLUSTER_ID') }}".

This also did not work.

I could go back to having the workflow use the SQL Warehouse Endpoint and a PAT but that defeats the purpose of spinning up a spark cluster for a job only to send the work to another (more expensive) spark cluster right?

Triage Summary

Here is a summary table of test cases

token type endpoint type outcome
PAT SQL Warehouse
Generic Access Token Cluster Endpoint 😭
PAT Cluster Endpoint 😭

Definitions:

  • Generic Access Token --> DBT_ACCESS_TOKEN
  • SQL Warehouse --> /sql/1.0/endpoints/<endpoint_id>
  • Cluster Endpoint --> "sql/protocolv1/o/{{ env_var('DBT_ORG_ID') }}/{{ env_var('DBT_CLUSTER_ID') }}"

Further digging:

It seems the dbt-databricks team have left a TODO with a similar conundrum so we are all on the same page it seems.

https://github.com/databricks/dbt-databricks/blob/main/dbt/adapters/databricks/connections.py#L408-L420

                # TODO: what is the error when a user specifies a catalog they don't have access to
                conn: DatabricksSQLConnection = dbsql.connect(
                    server_hostname=creds.host,

Expected Behaviour

  • As a user I would expect documentation around how the generic DBT_ACCESS_TOKEN is generated.
  • As a user I would expect the dbt Workflow task to have some way to grant it permissions
  • As a user I would expect some documentation around the cluster SQL endpoints as well as a bread crumb trail of links from the related user guides and dbt task documentation

Extra

I do realise that my expectations lie with the Databricks team and not with this library package. However it also makes sense to track this issue in public so that others can follow along as it will undoubtedly link to some code changes too.

I have an active use case I am piloting with a client so very keen to act as a beta tester in our dev environment so there is low risk and we can provide real world feedback.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:3
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
bilalaslamseattlecommented, Sep 15, 2022

You’ve set the standard for bug reports. Thanks @neozenith !!

1reaction
superdupershantcommented, Sep 13, 2022

Thanks for the awesome bug report @neozenith! The Workflows UC support is being looked into.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Frequently asked questions | dbt Developer Hub - dbt Docs
Our Frequently Asked Questions (FAQs) section is a space where you can find an answer to some questions we get asked a lot...
Read more >
focus on generic outcomes · dbt-labs/dbt-core@ab4f396
it is dynamically generated by change type and timestamp. # This workflow should not require any secrets since it runs for PRs. #...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found