question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dbt runs slowly on Redshift RA3 nodes due to excessive commits

See original GitHub issue

Describe the bug

This is similar to #2748, but is Redshift-specific. However, I think users of other databases would benefit from slightly faster runs.

The next-gen Redshift RA3 nodes seem to have very slow COMMITs, due to the fact that:

  • Redshift only processes a single transaction/COMMIT at a time, and
  • RA3 nodes write their commits to S3; this makes each COMMIT take about a second (🤮)

Many dbt users have reported excessively slow runs creating views on RA3 nodes. In my experience, creating views while running dbt in multiple threads can take upwards of 20 seconds of clock time (see logs below).

There are at least 4 transactions as part of the view materialization:

  1. Drop the intermediate relation
  2. Drop the temp relation
  3. Create new view; swap names with old view
  4. Drop old view

Best case, on RA3, this sequence will take 4 seconds per view; in multiple threads, as these sequences stack up in the transaction queue, it will take much longer (in clock time).

Steps To Reproduce

On a Redshift cluster with RA3 nodes (I’m on RA3.xplus x4), execute dbt run in multiple threads in a project with many views.

Expected behavior

Views should ideally take at most a second or two to be materialized.

Above, the four transactions to create a view could be compressed into a single transaction. I don’t think this would have other side-effects, as any one of these operations failing would cause the run step to fail anyway.

To achieve this, I think we would have to modify the materialization macro(s) so dropping the other relations happens inside the same commit. It may be sufficient to add an explicit BEGIN in the materialization.

Potential workarounds

On dbt v0.18+, we can avoid rebuilding views using the new selection syntax. However, this introduces some complexity and the potential for deployment errors when views change.

As an alternative, we can use more ephemeral models and fewer views, if the views do not need to be exposed in other contexts; this can make debugging more difficult.

Screenshots and log output

Here’s a recent run in dbt cloud: image

Investigating the logs, it’s clear the bottleneck is committing the transaction. This is each statement to execute the START view model analytics.logistics_ops_sectors line above: image

Source for logs is svl_statementtext:

with 
    statements as (

        select 
            *,
            (datediff(milliseconds, starttime, endtime)::float / 1000)::numeric(10, 3) as dur_s,
            (datediff(milliseconds, lag(endtime, 1) over (order by starttime), starttime)::float /
            1000)::numeric(10, 3) as wait_s

    from svl_statementtext
    where 
        -- matches specific prod run above
        starttime >= '2021-02-17 08:04:17.124262'
        and starttime <= '2021-02-17 08:04:44.216802'
        and pid = 17738
        and sequence = 0

    order by starttime asc

),
    totals as (

        select
            null::int, 
            null::int, 
            null::int, 
            null::varchar, 
            getdate()::timestamp, 
            null::timestamp, 
            null::int, 
            null::varchar,
            ' --------- TOTAL ---------'::varchar,
            sum(dur_s), 
            sum(wait_s)

        from statements

 ),
    unioned as (

         select * from statements
         union all
         select * from totals

)
select * from unioned order by starttime

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

0.17.1

The operating system you’re using: Windows, but reproduces on dbt cloud The output of python --version: 3.7.9

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:7
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
tconbeercommented, Aug 9, 2021

Yes, it’s still “slow” after this change, but not as bad… should be up to 50% reduction in clock time. Best case creating a view still requires 2 commits, which should take ~2 seconds without any other workloads on the cluster, but more likely 2-4x that if you’re running dbt in multiple threads or have concurrent workloads.

1reaction
tconbeercommented, Jun 11, 2021

Coming back to this with some fresh eyes. Instead of modifying the cascade or transaction behavior, I think a better solution is for the materialization to:

  1. check the relational cache to see whether these temp relations exist,
  2. try to drop ... cascade them only if they already exist. (no-op otherwise)

I think this would be simple and fast using adapter.get_relation on the “intermediate” and “backup” relation names.

This would introduce a race condition if there was another process modifying the same relations at roughly the same time. But that problem exists with the current implementation, since the drop is done before the temp relation is needed (in a different transaction). And I think if you have multiple processes running dbt on the same models at the same time, all bets are off, anyway.

We may as well make the same change in the table materialization, since the mechanics are the same.

Redshift and Postgres use the materializations from the global_project, so I propose we change it there; Snowflake and BQ have their own materializations.

I can open a PR with this change if we’re good on this solution.

Read more comments on GitHub >

github_iconTop Results From Across the Web

dbt runs slowly on Redshift RA3 nodes due to ... - GitHub
Many dbt users have reported excessively slow runs creating views on RA3 nodes. In my experience, creating views while running dbt in multiple ......
Read more >
Factors affecting query performance - Amazon Redshift
When you run a query, the query optimizer redistributes the data to the compute nodes as needed to perform any joins and aggregations....
Read more >
Anatomy of a Redshift Query - Medium
We use AWS Redshift as the data warehouse that powers BI… ... System overhead can make a query seem slow even if the...
Read more >
Support for cross-database sources on Redshift RA3 instances
Cross-database queries for RA3 instances are now supported by dbt Cloud projects using a Redshift connection. With cross-database queries, you ...
Read more >
Amazon Redshift | Noise | Page 10
Amazon Redshift TPC-DS benchmark results, November 2020. Out-of-Box, Tuned. Data set (TB), Cluster, Runtime (sec), Price per TB per run, Runtime
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found