Reorganize rule names
See original GitHub issueSearch before asking
- I searched the issues and found no similar issues.
Description
Let’s think about a nice way to rename rules/codes, so that we don’t have to look up what the rule code is everytime we think about them 😵
Here’s an example of what pylint
messages look like:
$ pylint etl.py
************* Module etl
etl.py:20:61: C0303: Trailing whitespace (trailing-whitespace)
etl.py:1:0: C0114: Missing module docstring (missing-module-docstring)
etl.py:7:0: W0622: Redefining built-in 'compile' (redefined-builtin)
etl.py:7:0: E0401: Unable to import 'parse' (import-error)
------------------------------------------------------------------
Your code has been rated at 5.56/10 (previous run: 9.95/10, -4.40)
You can see that the have “common sense” names as the end of each linting message.
Here’s an example of what the current output looks like for sqlfluff lint
:
$ sqlfluff lint models/output/
=== [dbt templater] Sorting Nodes...
=== [dbt templater] Compiling dbt project...
=== [dbt templater] Project Compiled.
== [models/output/redcap_import.sql] FAIL
L: 6 | P: 5 | L019 | Found leading comma. Expected only trailing.
L: 6 | P: 6 | L008 | Commas should be followed by a single whitespace unless
| followed by a comment.
L: 9 | P: 40 | L012 | Implicit/explicit aliasing of columns.
L: 17 | P: 6 | L003 | Expected 1 indentations, found 1 [compared to line 16]
For the record I think on the whole, our output is better organized/nicer to look at than pylint. But I think it would be greatly improved if we could implement common sense names like this:
$ sqlfluff lint models/output/
=== [dbt templater] Sorting Nodes...
=== [dbt templater] Compiling dbt project...
=== [dbt templater] Project Compiled.
== [models/output/redcap_import.sql] FAIL
L: 6 | P: 5 | L019 | Found leading comma. Expected only trailing. (wrong-comma-style)
L: 6 | P: 6 | L008 | Commas should be followed by a single whitespace unless
| followed by a comment. (comma-missing-whitespace)
L: 9 | P: 40 | L012 | Implicit/explicit aliasing of columns. (wrong-column-alias-style)
L: 17 | P: 6 | L003 | Expected 1 indentations, found 1 [compared to line 16] (unmatched-indentation)
With this change, I could ideally also disable/enable rules according to their readable names and/or their codes, such as
select
field_1,
field_2, --noqa: unmatched-indentation
from my_table
You’ll also notice from the above example that pylint codes have different prefixes (W0622
, E0401
, etc)
Here’s their definitions for those, but basically E = Error
, W = Warning
, R = Refactor
, C = Convention
And this brings us to the second part of this issue: what the codes should be. I think we can follow pylint
’s example, and break up the rules into categories. Here are a few to start:
R = readability
- Operators should follow a standard for being before/after newlines (L007 --> R001)
- Inconsistent capitalisation of keywords (L010 --> R002)
C = Convention
(AKA Best Practices)
- Implicit/explicit aliasing of table (L011 --> C002)
- Table aliases should be unique within each clause (L020 --> C002)
- Trailing commas within select clause. (L038 --> C002)
W = Whitespace
- Indentation not consistent with previous lines (L003 --> W001)
- Operators should be surrounded by a single whitespace (L006 --> W002)
D = dialect specific
- SP_ prefix should not be used for user-defined stored procedures in T-SQL. (L056 --> D001)
Use case
- Give all the rules “symbolic” names or “short” names
- Include these in the lint messages
- Allow these to be disabled in the config by short name and/or rule code
- Consider if rules should be disabled by default (I’m looking at you
L052
. And yes I’m not going to say what rule that is here in order to prove a point on why this is an important change to make 😜 ) - Should we more tightly integrate these will rule groups? Should these replace rule groups, and all just become implicit rule groups?
Dialect
All
Are you willing to work on and submit a PR to address the issue?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created 10 months ago
- Comments:16 (13 by maintainers)
A lot of linters (pylint included?) support both. I don’t see why we wouldn’t do the same? Maybe for 3.0.0 we drop the older ones, but I wouldn’t for 2.0.0 - that would be quite a big change IMHO.
Been looking at the code today and getting some initial thoughts down
New selection syntax should support
And it should support nested configs, meaning if we have a
.sqlfluff
file in the project root AND in one of the child directories, then we carry over selected rules from the root directory and further include/exclude rules based off configuration in the child directory.On a separate note, I don’t think that nesting
rules
/exclude_rules
in the.sqlfluff
currently works super well – though I might be approaching it the wrong way. I can talk more about this at the next maintainers meetingHere’s one potential approach:
We’ll still keep
rules
andexclude_rules
config values.We support comma separated selection/exclusion e.g.
core,LB005
,layout,captilasation.literals
Before any filtering is done, the selection input is expanded into a list of rule names e.g.
"layout.spacing,L009"
-->[LS001,LS002,LS003,LB001]
It then replaces any selection configuration that came before it (nested configs)
Now we have our final
rules
selectionFollow the same process for
exclude_rules
(including exclusion configuration that comes before it)Then filter the rules being included against the rules being excluded, and you have your rules to run for a given file(s).
I think in order to do proper expansion from criteria to individual rules we’ll need to collect some sort of rule metadata at the beginning of linting that we can reference while doing the selection (similar to how
dbt
looks at themanifest.json
to help with model selection). It might even be smart to follow their lead include a manifest whensqfluff
is installed that has all this metadata. We could auto-generate it whenever rules are added/re-categorized, and include it as an asset in the package.Lastly, with this approach it would no longer allow as much fine tuning. For example, under this proposal if someone only wants to run
L005
, it’s going to be re-mapped toLS001
which actually includesL001,L005,L006,L008,L015,L017,L023,L024,L039,L048,L050
. I think this is actually a big improvement, though it is a breaking change. But if someone only wantsL005
and not all these other rules, they’ll be disappointedI know this was a little rambly, but I’m happy to take any feedback