question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Closed in favour of individual issues - Support for all major syntactical opinions of the dbt coding conventions, a comprehensive breakdown.

See original GitHub issue

Why?

I’d be really interested to conduct a survey of current sqlfluff users to find out where they are applying sqlfluff linting. Anecdotally everyone I know who uses sqlfluff applies it to a dbt project, and I’m confident that fully supporting their style guide would dramatically increase uptake.

https://github.com/fishtown-analytics/corp/blob/master/dbt_coding_conventions.md

From what is reasonable to programatically identify, we are currently missing:

CTE’s

  1. All {{ ref(‘…’) }} statements should be placed in CTEs at the top of the file
  2. CTEs should be formatted like this:
with

events as (

    ...

),

-- CTE comments go here
filtered_events as (

    ...

)

select * from filtered_events

In the example here the rules are made clearer, showing that CTE’s using {{ ref() }} should be select * statements at the top of the file. While the {{ ref() }} function can only be explicitly identified pre-templating, using an internal jinja function to template {{ ref() }} to a parser recognisable table reference function should solve this. We could then just perform checks on whether the select statement contains only a *, and whether the CTE appears below any CTE’s with explicit column selections. I think there would likely be a rule for each.

For formatting, we would need a new rule enforcing a specific number indentations (4 spaces) at the select, from and group statements. Another rule would also be needed to enforce a new line after CTE declarations (, enclosed queries, and closures ).

Rules

  1. Select from {{ ref() }} CTE’s should be * only: relates to https://github.com/sqlfluff/sqlfluff/issues/380
  2. Select from {{ ref() }} CTE’s should occur above non {{ ref() }} CTE’s
  3. Expression block should be indented four spaces
  4. One new line after a CTE declaration (
  5. One new line after a CTE enclosed query 6. One new line after a CTE closure ) Enforced by L022

SQL Styling

Rules

  1. Fields should be stated before aggregates / window functions
  2. Specify join keys - do not use using. Certain warehouses have inconsistencies in using results (specifically Snowflake).
  3. Prefer union all to union *
  4. Avoid table aliases in join conditions (especially initialisms) – it’s harder to understand what the table called “c” is compared to “customers”. 11. If joining two or more tables, always prefix your column names with the table alias. If only selecting from one table, prefixes are not needed. Enforced by L027

The numbered rules can map directly to a set of new rules in sqlfluff. Very keen for thoughts on all of this, these are all thoughts from a relatively brief scan of the current rule reference.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
alanmcruickshankcommented, Oct 5, 2020

I’m up for the general idea of this. I think there will be some tricky corners in this, but it’s probably sensible to deal with them case by case.

  • Some are going to be tricky because we line after templating and not before, so we might need to work out a way to get some metadata out from the templater to work with. This covers 1 & 2.
  • Some are tricky because I know some people have strong views 😄 : I’m thinking 8.
  • Some we should think about whether they should be in their own rule, or whether they should be part or existing or larger rules: 3, 4 & 5
  • The rest shouldn’t be too painful: 7, 9 & 10.

@NiallRees - For 7, 8, 9 & 10 I think each should have their own issue on Github for us to track progress. 1 & 2 should probably be part of the same issue, and might need some thinking to work out how to do. I’m tempted to have 3, 4 & 5 as one issue too, and then whoever picks that up can decide whether it’s three new rules, or something more integrated than that.

1reaction
NiallReescommented, Oct 4, 2020

@pwildenhain Thanks a lot have updated the description. I think https://github.com/sqlfluff/sqlfluff/issues/380 actually differs from the dbt coding conventions, which say that a reference to an external table should always initially be select *, but that the columns should also then be specified in a second CTE. So the output of the entire query should never be indeterminate in number of columns.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Select must be * from {{ ref() }}, and must be at top of file #464
Closed in favour of individual issues - Support for all major syntactical opinions of the dbt coding conventions, a comprehensive breakdown.
Read more >
Best practices - dbt Developer Hub
Best practices. This page contains the collective wisdom of experienced users of dbt on how to best use it in your analytics work....
Read more >
dbt Guide - GitLab
Documenting and testing new data models is a part of the process of creating them. A new dbt model is not complete without...
Read more >
Modern data modeling: Start with the end? | Hacker News
My sql is so much more readable with common logic extracted into ephmeral models. I practice same method to write clear code to...
Read more >
It Takes a Village: A Mixed Method Analysis of Inner Setting ...
DBT is a principle-based intervention with four standard modes of treatment in an outpatient setting: weekly individual therapy, weekly group skills training, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found