Closed in favour of individual issues - Support for all major syntactical opinions of the dbt coding conventions, a comprehensive breakdown.
See original GitHub issueWhy?
I’d be really interested to conduct a survey of current sqlfluff users to find out where they are applying sqlfluff linting. Anecdotally everyone I know who uses sqlfluff applies it to a dbt project, and I’m confident that fully supporting their style guide would dramatically increase uptake.
https://github.com/fishtown-analytics/corp/blob/master/dbt_coding_conventions.md
From what is reasonable to programatically identify, we are currently missing:
CTE’s
- All {{ ref(‘…’) }} statements should be placed in CTEs at the top of the file
- CTEs should be formatted like this:
with
events as (
...
),
-- CTE comments go here
filtered_events as (
...
)
select * from filtered_events
In the example here the rules are made clearer, showing that CTE’s using {{ ref() }}
should be select *
statements at the top of the file. While the {{ ref() }}
function can only be explicitly identified pre-templating, using an internal jinja function to template {{ ref() }}
to a parser recognisable table reference function should solve this. We could then just perform checks on whether the select statement contains only a *
, and whether the CTE appears below any CTE’s with explicit column selections. I think there would likely be a rule for each.
For formatting, we would need a new rule enforcing a specific number indentations (4 spaces) at the select, from and group statements. Another rule would also be needed to enforce a new line after CTE declarations (
, enclosed queries, and closures )
.
Rules
Select from {{ ref() }}
CTE’s should be * only: relates to https://github.com/sqlfluff/sqlfluff/issues/380Select from {{ ref() }}
CTE’s should occur above non {{ ref() }} CTE’s- Expression block should be indented four spaces
- One new line after a CTE declaration
(
- One new line after a CTE enclosed query
6. One new line after a CTE closureEnforced by)
L022
SQL Styling
Rules
- Fields should be stated before aggregates / window functions
- Specify join keys - do not use using. Certain warehouses have inconsistencies in using results (specifically Snowflake).
- Prefer union all to union *
- Avoid table aliases in join conditions (especially initialisms) – it’s harder to understand what the table called “c” is compared to “customers”.
11. If joining two or more tables, always prefix your column names with the table alias. If only selecting from one table, prefixes are not needed.Enforced byL027
The numbered rules can map directly to a set of new rules in sqlfluff. Very keen for thoughts on all of this, these are all thoughts from a relatively brief scan of the current rule reference.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
I’m up for the general idea of this. I think there will be some tricky corners in this, but it’s probably sensible to deal with them case by case.
@NiallRees - For 7, 8, 9 & 10 I think each should have their own issue on Github for us to track progress. 1 & 2 should probably be part of the same issue, and might need some thinking to work out how to do. I’m tempted to have 3, 4 & 5 as one issue too, and then whoever picks that up can decide whether it’s three new rules, or something more integrated than that.
@pwildenhain Thanks a lot have updated the description. I think https://github.com/sqlfluff/sqlfluff/issues/380 actually differs from the dbt coding conventions, which say that a reference to an external table should always initially be select *, but that the columns should also then be specified in a second CTE. So the output of the entire query should never be indeterminate in number of columns.