Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

a stateless and fast-initializing dbt RPC server

See original GitHub issue

Describe the feature

Hey folks!

We’re experimenting with dbt RPC on Cloud Run (Google’s serverless Docker-based container service) at our company.

However the dbt RPC implementation has a couple limitations that prevent it from being deployed there.

When turning on the RPC server w/ dbt rpc, the server performs an initial compilation step. This process can be sluggish for large projects. While this is usually not an issue if dbt RPC is running on a VM (EC2, GCE, etc), it does become a problem because of the abstraction Cloud Run provides involves spinning-up new containers (and therefore dbt RPC servers) when load spikes.
The asynchronous nature of many of the dbt RPC server tasks/methods does not fit the Cloud Run stateless model. Because Cloud Run operates on a container-level abstraction, it cannot guarantee that polling requests will match the same container that kicked off the job. The assumption when using Cloud Run is request/response is done in a single transaction. Usually folks use Cloud Run for serving RESTful APIs.

Can I suggest some enhancements that would make a dbt RPC server deployable on this platform, and more widely deployable to other services that naturally breathe with incoming load by spinning-up container clones?

It would be great if we could compile our project upfront, once for the RPC server. Perhaps a dbt compile at Docker build time and a dbt rpc --cached flag which would bootstrap itself from disk instead of at dbt rpc runtime. At least for our application, the models/macros do not change once a release is made, so a one-time project compile is actually safe.
An additional field in the asynchronous tasks/methods that allow the user to specify synchronous operation. Actually, for our specific use-case we only need the compile_sql dbt RPC method task/method, but this idea could be extended for other tasks/methods.

Perhaps an additional field in the params object could indicate that you want a synchronous response.
Currently, the payloads look like:

{
    "jsonrpc": "2.0",
    "method": "compile_sql",
    "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d",
    "params": {
        "timeout": 60,
        "sql": "c2VsZWN0IHt7IDEgKyAxIH19IGFzIGlk",
        "name": "my_first_query"
    }
}

Could be:

{
    "jsonrpc": "2.0",
    "method": "compile_sql",
    "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d",
    "params": {
        "timeout": 60,
        "sql": "c2VsZWN0IHt7IDEgKyAxIH19IGFzIGlk",
        "name": "my_first_query",
        "synchronous": true
    }
}

At least for the compile_sql task, even very complex models/macros usually return to us in under 1 second, so asynchronous operation (polling) is usually overkill.

Describe alternatives you’ve considered

We currently run dbt RPC on Google Compute Engine. But, it’s more management than we’d like.

Who will this benefit?

Data Engineers looking to deploy dbt RPC in a serverless Docker-based environment.

Are you interested in contributing this feature?

Personally, my python-fu is pretty weak, but we’d be super interested in reporting feedback.

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:8 (3 by maintainers)

Top GitHub Comments

3reactions

jtcohen6commented, Oct 12, 2021

I’m going to close this issue. I will say that this topic—a fast-initializing, reliable, and “stateless” server—is something we’ve been thinking and talking about a lot lately, as we plan for the next-generation dbt Server

1reaction

jarscommented, Jul 14, 2021

@hugohjerten, sorry for the delay.

Yes and no. We were unsuccessful in getting dbt RPC running in Cloud Run (cleanly).

We took a different approach instead. Rather than perform dbt RPC compile_sql at runtime, we turn on the dbt RPC server when we build our Docker Image and pre-compile all the macro argument combinations, saving the results into files inside the image.

We then, look up those pre-compiled templates at runtime, based off a conventional filename. That filename is made-up of the macro name, and all of the key/value pairs that built it.

It works for us!

Top Results From Across the Web

a stateless and fast-initializing dbt RPC server #2611 - GitHub

We're experimenting with dbt RPC on Cloud Run (Google's serverless Docker-based container service) at our company. However the dbt RPC ...

rpc | dbt Developer Hub

This server compiles and runs queries in the context of a dbt project. Additionally, the RPC server provides methods that enable you to...

dbt-rpc - PyPI

A JSON RPC server that provides an interface to programmically interact with dbt projects. Navigation. Project description; Release history; Download files ...

Dagster with dbt

Using a dbt RPC server in a Dagster job#. This integration provides two separate resources to help run commands against a dbt RPC...

Database Error in rpc request (from remote system) syntax ...

Error in DBT cloud Server error: Database Error in rpc request (from remote ... In dbt I am trying to union all tables...