"sqlfluff fix" is very slow
See original GitHub issueI ran sqlfluff fix --rules L001,L003,L006,L008,L010,L011,L016,L018,L019,L022,L023,L030
on the file provided below. It took about 5 minutes to run. The file is only 250 lines long, so this seems excessive. During the run, I saw many repetitions (about 230, I think?) of the message “WARNING: One fix for L018 not applied, it would re-cause a previously fixed error.”
Query:
-- This query generated by script/generate_corr_queries.py and should probably not be
-- modified manually. Instead, make changes to that script and rerun it.
WITH
raw_effect_sizes AS (
SELECT
COUNT(1) AS campaign_count,
state_user_v_peer_open
,business_type
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_small_subject_line to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_small_subject_line), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_small_subject_line)) AS open_uses_small_subject_line
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_personal_subject to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_personal_subject), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_personal_subject)) AS open_uses_personal_subject
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_timewarp to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_timewarp), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_timewarp)) AS open_uses_timewarp
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_small_preview to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_small_preview), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_small_preview)) AS open_uses_small_preview
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_personal_to to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_personal_to), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_personal_to)) AS open_uses_personal_to
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_ab_test_subject to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_ab_test_subject), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_ab_test_subject)) AS open_uses_ab_test_subject
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_ab_test_content to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_ab_test_content), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_ab_test_content)) AS open_uses_ab_test_content
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_preview_text to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_preview_text), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_preview_text)) AS open_uses_preview_text
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_sto to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_sto), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_sto)) AS open_uses_sto
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_freemail_from to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_freemail_from), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_freemail_from)) AS open_uses_freemail_from
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_resend_non_openers to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_resend_non_openers), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_resend_non_openers)) AS open_uses_resend_non_openers
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_promo_code to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_promo_code), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_promo_code)) AS open_uses_promo_code
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_prex to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_prex), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_prex)) AS open_uses_prex
-- The following is the slope of the regression line. Note that CORR (which is the Pearson's correlation
-- coefficient is symmetric in its arguments, but since STDDEV_POP(open_rate_su) appears in the
-- numerator this is the slope of the regression line considering STDDEV_POP(open_rate_su) to be
-- the "y variable" and uses_ab_test_from to be the "x variable" in terms of the regression line.
,SAFE_DIVIDE(SAFE_MULTIPLY(CORR(open_rate_su, uses_ab_test_from), STDDEV_POP(open_rate_su)), STDDEV_POP(uses_ab_test_from)) AS open_uses_ab_test_from
FROM
`{{gcp_project}}.{{dataset}}.global_actions_states`
GROUP BY
state_user_v_peer_open
,business_type),
imputed_effect_sizes AS (
SELECT
campaign_count,
state_user_v_peer_open
,business_type
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_small_subject_line), 0, open_uses_small_subject_line), 0) AS open_uses_small_subject_line
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_personal_subject), 0, open_uses_personal_subject), 0) AS open_uses_personal_subject
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_timewarp), 0, open_uses_timewarp), 0) AS open_uses_timewarp
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_small_preview), 0, open_uses_small_preview), 0) AS open_uses_small_preview
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_personal_to), 0, open_uses_personal_to), 0) AS open_uses_personal_to
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_ab_test_subject), 0, open_uses_ab_test_subject), 0) AS open_uses_ab_test_subject
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_ab_test_content), 0, open_uses_ab_test_content), 0) AS open_uses_ab_test_content
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_preview_text), 0, open_uses_preview_text), 0) AS open_uses_preview_text
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_sto), 0, open_uses_sto), 0) AS open_uses_sto
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_freemail_from), 0, open_uses_freemail_from), 0) AS open_uses_freemail_from
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_resend_non_openers), 0, open_uses_resend_non_openers), 0) AS open_uses_resend_non_openers
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_promo_code), 0, open_uses_promo_code), 0) AS open_uses_promo_code
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_prex), 0, open_uses_prex), 0) AS open_uses_prex
-- We now impute the value of the effect size to 0 if it was NaN or NULL. This is to
-- take into account states where all campaigns either did or did not perform an
-- action. In these cases, we assume that campaign outcome is uncorrelated with
-- the action because we do not have evidence otherwise.
,COALESCE(IF(IS_NAN(open_uses_ab_test_from), 0, open_uses_ab_test_from), 0) AS open_uses_ab_test_from
FROM
raw_effect_sizes
),
action_states AS (
SELECT
has_used_small_subject_line
,has_used_personal_subject
,has_used_timewarp
,has_used_small_preview
,has_used_personal_to
,has_used_ab_test_subject
,has_used_ab_test_content
,has_used_preview_text
,has_used_sto
,has_used_freemail_from
,has_used_resend_non_openers
,has_used_promo_code
,has_used_prex
,has_used_ab_test_from
FROM `{{gcp_project}}.{{dataset}}.global_state_space`
GROUP BY has_used_small_subject_line
,has_used_personal_subject
,has_used_timewarp
,has_used_small_preview
,has_used_personal_to
,has_used_ab_test_subject
,has_used_ab_test_content
,has_used_preview_text
,has_used_sto
,has_used_freemail_from
,has_used_resend_non_openers
,has_used_promo_code
,has_used_prex
,has_used_ab_test_from)
SELECT
imputed_effect_sizes.*,
has_used_small_subject_line
,has_used_personal_subject
,has_used_timewarp
,has_used_small_preview
,has_used_personal_to
,has_used_ab_test_subject
,has_used_ab_test_content
,has_used_preview_text
,has_used_sto
,has_used_freemail_from
,has_used_resend_non_openers
,has_used_promo_code
,has_used_prex
,has_used_ab_test_from
FROM
imputed_effect_sizes
CROSS JOIN action_states
ORDER BY campaign_count DESC
.sqlfluff
:
[sqlfluff]
verbose = 0
nocolor = False
dialect = bigquery
templater = jinja
rules = None
exclude_rules = None
recurse = 0
[sqlfluff:templater:jinja:macros]
# Some rules can be configured directly from the config common to other rules.
[sqlfluff:rules]
tab_space_size = 4
# Some rules have their own specific config.
[sqlfluff:rules:L010]
capitalisation_policy = consistent
[sqlfluff:templater:jinja:context]
dataset=dataset
gcp_project=gcp_project
benchmark_user_map_project=project
benchmark_user_map_dataset=summary
benchmark_user_map_table=benchmark_user_map
benchmark_summaries_project=project
benchmark_summaries_dataset=summary
benchmark_summaries_table=benchmark_summaries
campaign_performance_project=project
campaign_performance_dataset=summary
campaign_performance_table=campaign_performance
user_average_project=project
user_average_dataset=summary
user_average_table=average_user_performance
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (12 by maintainers)
Top Results From Across the Web
Slow lint of ~1600 line generated query · Issue #1784 · sqlfluff ...
We have a fix for this about to be merged in #1744 . It would be good to know if that branch resolves...
Read more >Rules Reference — SQLFluff 1.4.5 documentation
This rule will fail if a single section of whitespace contains both tabs and spaces. This rule is sqlfluff fix compatible. Groups: all...
Read more >Ben Chuanlong Du's Blog - sqlfluff
The command sqlfluff fix can be used to fix issues (including formatting issues) in SQL code. It can be used as a tool...
Read more >vscode-sqlfluff - Visual Studio Marketplace
A linter and auto-formatter for SQLfluff, a popular linting tool for SQL and dbt. ... You can run Format Document to fix the...
Read more >sqlfluff Changelog - pyup.io
Bug fix: dbt templater ignores .sqlfluff file encoding on Windows ... This is a fix to the configuration migration from 1.4.0.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@barrywhart - merged my most recent changes in #413 . It’s faster, but not quite as fast as I think we would hope yet.
I think like @aaronsteers says, part of the reason that this query takes so long to fix is that there are so many issues found in it, so if it was fully compliant, then it would parse much faster. I’m all ears to more suggestions on this though! All my gambits on this for now are merged.
Thanks! Yes, this may be good enough. I’m still working on freeing up time in my schedule. We’ve used SQLFluff successfully on several projects. The next steps I’m considering will include: