expression_is_true is costly when applied to a large table
See original GitHub issueDescribe the bug
When running an expression_is_true test I noticed that the test required ~500gb of data (on BigQuery), which in my opinion is extremely costly for a simple test.
Because the way the test is setup (SELECT * in the last statement), the total cost of the test is the same as doing a SELECT * FROM TABLE_THAT_WE_TEST. Which we know can be quite expensive for long and wide tables.
Steps to reproduce
- CREATE A LARGE TABLE
- Run on BQ
SELECT * FROM LARGE_TABLEand check the cost - Run a
expression_is_truetest againstLARGE_TABLEand see that it is equally costly as theSELECT *
Expected results
I expect a simple test to be really cheap. I do not want to take into account the cost of a simple column test when developing.
Actual results
Its expensive.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
System information
The contents of your packages.yml file:
Which database are you using dbt with?
- postgres
- redshift
- [X ] bigquery
- snowflake
- other (specify: ____________)
The output of dbt --version:
<output goes here>
Additional context
Add any other context about the problem here. For example, if you think you know which line of code is causing the issue.
Are you interested in contributing the fix?
Sure!
expression_is_true.sql:
{% test expression_is_true(model, expression, column_name=None, condition='1=1') %}
{# T-SQL has no boolean data type so we use 1=1 which returns TRUE #}
{# ref https://stackoverflow.com/a/7170753/3842610 #}
{{ return(adapter.dispatch('test_expression_is_true', 'dbt_utils')(model, expression, column_name, condition)) }}
{% endtest %}
{% macro default__test_expression_is_true(model, expression, column_name, condition) %}
with meet_condition as (
select * from {{ model }} where {{ condition }}
)
select
1 -- Change the * to a fixed single column, there might be some relevant info you could pass here for debugging, but just not ALL columns.
from meet_condition
{% if column_name is none %}
where not({{ expression }})
{%- else %}
where not({{ column_name }} {{ expression }})
{%- endif %}
{% endmacro %}
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (4 by maintainers)

Top Related StackOverflow Question
@elyobo my gut reaction here is that this would be something that would make sense to define across multiple tests. So I’d recommend you open an issue in dbt-core. Something like this would be cool:
And then we’d something like
BTW, I havent checked whether this same thing is happening in other tests. Might be valuable to check this.