Replace information_schema queries with faster alternatives on Snowflake
See original GitHub issueDescribe the feature
When dbt starts, it runs a query to the information_schema
for every schema in the project. This happens even if the run involves a single model (single schema).
Each of these queries is taking anywhere from 4-20 seconds, presumably depending on how much load the overall Snowflake system has across accounts.
These queries seem to be running on the main thread and are therefore sequential. We have a project with 9 schemas with a time-to-first-model of close to 90 seconds. As you can imagine, this is a huge productivity drag.
We are contacting Snowflake about speeding up information_schema
queries but this could also be improved if dbt ran these queries in multiple threads and if it only ran queries for the schemas involved in the given run.
Also, I believe the show tables
or show views
commands could be used in this particular case (these take in the order of 100-200 ms) instead of queries to the information schema.
Below is one of these queries which took over 12 seconds:
2019-10-29 12:00:19,554 (MainThread): Acquiring new snowflake connection "list_relations_without_caching".
2019-10-29 12:00:19,554 (MainThread): Re-using an available connection from the pool.
2019-10-29 12:00:19,554 (MainThread): Using snowflake connection "list_relations_without_caching".
2019-10-29 12:00:19,554 (MainThread): On list_relations_without_caching: BEGIN
2019-10-29 12:00:20,197 (MainThread): SQL status: SUCCESS 1 in 0.64 seconds
2019-10-29 12:00:20,197 (MainThread): Using snowflake connection "list_relations_without_caching".
2019-10-29 12:00:20,197 (MainThread): On list_relations_without_caching: select
table_catalog as database,
table_name as name,
table_schema as schema,
case when table_type = 'BASE TABLE' then 'table'
when table_type = 'VIEW' then 'view'
when table_type = 'MATERIALIZED VIEW' then 'materializedview'
else table_type
end as table_type
from aibi_analytics_db.information_schema.tables
where table_schema ilike 'dbt_pedro_sol_matching'
and table_catalog ilike 'aibi_analytics_db'
2019-10-29 12:00:32,862 (MainThread): SQL status: SUCCESS 19 in 12.66 seconds
Describe alternatives you’ve considered
I inquired whether a macro could be used to override the information schema queries but was told it’s not possible.
Additional context
Snowflake
Who will this benefit?
This will speed time-to-first-model for Snowflake projects with multiple schemas
Issue Analytics
- State:
- Created 4 years ago
- Reactions:8
- Comments:17 (9 by maintainers)
@drewbanin Thanks for looking into this.
What if you run
show schemas in database <database>
first and then do the case-insensitive search in python to find the correct capitalization of the schema name? Then you can use it to runshow tables in schema <database>.<schema>
with the correct capitalization.If you give me some pointers on where to go in the code I could take a stab at this and create a PR over the next couple of weeks.
Hey @pedromachados - I’m not sure that we’ll want to start with parallelizing these queries - I’d be much more in favor of using
show schemas
,show tables
, etc etc in lieu ofinformation_schema
queries! Even if we did parallelize these, if one of them takes 20 seconds to complete, that’s still too slow for us to work with IMO.I think we discussed this on Slack, but there are some real challenges we’d need to account for in using
show...
instead ofselect .. from information_schema.<table>
.For one, the
show ...
queries only return a maximum of something like 10k objects. If we tried to runshow tables in database ...
, there’s a super real chance that we’d hit this maximum in even moderately sized warehouses! So, we’d need to useshow tables in schema <database>.<schema>
which is also challenging because we’d need to quote these identifiers exactly correctly. This is super doable for dbt, but quoting on Snowflake is always a big pain!For two,
show columns
returns different data than the results returned from the information schema. This might be tractable for us, but it’s a big change for us to make!I’m super keen to male this change - going to queue it up for a future release.