question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Describe the feature

Create a feature to specify “test groups” such that there is a shorthand for specifying several related tests on a model at once. Currently, Fishtown best practices recommend that you specify unique and not_null tests on the primary key of each of your models - these tests are logically related. This feature would allow you specify just a primary key test group on a single field, and would automatically generate and compile both a unique and not_null test for that model on a dbt test invocation. This could be extended to other logically grouped tests on single fields.

Describe alternatives you’ve considered

Alternatives include explicitly calling each test, as it works today! This is still absolutely a viable approach, and has the benefit of forcing analysts to explicitly declare their assumptions about their data.

Additional context

Example:

current state:

version: 2

models:
    - name: cool_data_model
      columns:
          - name: id
            tests:
                - unique
                - not_null

proposed:

version: 2

models:
    - name: cool_data_model
      primary_key: id

Obviously the example is up for debate - might be worth keeping column-level definitions in there, and maybe something that explicitly says the word test but generally, a single specification could make the definition of these tests more concise.

Who will this benefit?

This is primarily for analytics engineers to clean up the definitions in a schema.yml file.

Are you interested in contributing this feature?

For sure!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jtcohen6commented, Nov 12, 2020

Neat idea @dave-connors-3! What you’re proposing sounds along the lines of a built-in YAML anchor, i.e.

models:
  - name: cool_data_model
    columns:
      - name: id
        tests: &primary_key
          - unique
          - not_null
  - name: my_other_model
    columns:
      - name: id
        tests: *primary_key

Without, of course, having to define the initial &primary_key in each file.

I heard a related idea the other day, with a slightly different angle, and I’m now convinced that the two might be connected.

Right now, dbt runs separate queries for each of unique, not_null, etc, even if they’re all defined on the same column in the same table. What if there was a way for dbt to consolidate those tests into a single query?

Essentially, I’m thinking of a genuinely different custom schema that acts as a “combo” of existing builtin tests:

models:
  - name: cool_data_model
    columns:
      - name: id
        tests:
          - primary_key

Then:

{% macro test_primary_key(model, column_name) %}

    with potential_dupes as (

        select
        
            {{ column_name }},
            count(*) as num_rows
        
        from {{ model }}
        group by 1
        
    )
    
    select sum(num_rows)
    from potential_dupes
    where {{ column_name }} is null   -- not_null
      or num_rows > 1                 -- unique
    
{% endmacro %}

Hypothetically, it would be more efficient in terms of database time and resources. The downside is ambiguity: should the test should fail, it could have been because the column was null, or not unique, or null and not unique.

What do you think?

0reactions
snajjarcommented, Oct 19, 2022

I’m facing the same problem currently. I’d like to add tests on my repo to enforce (for instance) the respect of the DBT style guide, but what I end up doing is adding a lot of tests to EVERY model in DBT (100+ and counting).

Since thoses are the same tests every time, but to be invoked from different model yml files, I didn’t find any solution with yaml anchors. I don’t see a better solution than having the option of defining a test group (@noel I’ll be really explicit on the naming!).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Add, edit, or delete a test group - Jama Connect User Guide
To edit an existing test group, select the settings icon in the corner of the group to open the Add/Edit window. You can...
Read more >
Add Student Tests to a Group
Step-by-Step · From Setup, select Groups. · Click the checkbox next to a group. · Click Select Tasks, select Add/Remove Student Tests in...
Read more >
6.3 Creating Test Groups
Right-click the Test Groups element, then click Group > Create to open the Group|Create dialog box: Type a group name in the Group...
Read more >
Preview and test groups
Adding users to an existing group. Go to Audience » Preview and Test Groups. Click for a group. Repeat for each user you...
Read more >
TestNG Test Groups with Examples
The ability to group the related tests is one of the most important features of TestNG. Users can group multiple tests into a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found