Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feat: Add support for non-DB state backends (`s3`, `dynamodb`, etc.)

See original GitHub issue

Spec discussion:

https://github.com/meltano/meltano/discussions/6270

I think we could introduce state backends such as s3, dynamodb, and other backends that have better reliability than an RDBMS and near-zero always on cost.

In the future, a Meltano-managed state offering, similar to Pulumi’s default experience.

_Originally posted by @aaronsteers in https://github.com/meltano/meltano/issues/2520#issuecomment-1145062214_

Additional context:

Currently a ‘current’ STATE is a composite on-demand scan through history records. This is not ideal in general - and we have #3340 logged to refactor this so a single table row would be the “backend” to read and write from.
Moving to a generic backend store would likely require also solving for #3340 - or at least the same refactoring would (likely) be needed in both cases to eliminate the need for scanning history logs.

Workarounds:

#2520 talks about potential workaround, but basically the current workaround is to:

meltano state get... to pull the latest state into a file.
upload that file to S3 before the container is deleted.
when next run, download the file from S3 and load to the systemdb with meltano state set ...

This works with our without a postgres or other long-lived rdmbs, since the built-in sqlite implementation is created on the fly if no postgresdb is specified, and the process above essentially just removes the long-term state storage requirement from the sqlite backend.

Issue Analytics

State:
Created a year ago
Reactions:3
Comments:12 (3 by maintainers)

Top GitHub Comments

2reactions

cjohnhansoncommented, Aug 11, 2022

There have been some synchronous discussions on this–documenting the results of those here.

The v1 of this feature is going to be to use a third-party library (likely smart_open) that will allow us to support users configuring state backends in the form of a simple URI, e.g. s3://some_bucket/some_prefix, where both partial and complete state files can be written. This means we’ll also need to implement some barebones locking mechanism. Locking doesn’t need to be too sophisticated at first because running the same pipeline using the same backend concurrently in separate deployments should be a pretty rare use case, and those users can still use the existing system db state backend to get more deterministic behavior. Plus, the meltano state commands allow users to manually edit, clear, copy, or merge state to fix any issues that arise from concurrent runs.

Creating state backends is going to require decoupling state from job history, so we’ll need to tackle https://github.com/meltano/meltano/issues/3340 before getting started on the actual state backend implementation.

This implementation will be done in such a way as to lay the foundation for user contributed state backend plugins in some future iteration, but “pluggable” backends are out of scope for the time being. The URI approach solves for a huge number of use cases and takes us one step closer to eliminating the need for a postgres backend in production deployments without the heavy lift of supporting arbitrary plugins.

1reaction

cjohnhansoncommented, Sep 2, 2022

@aaronsteers –

First PR. Launch the first state backend, which would be systemdb and the new state table in https://github.com/meltano/meltano/issues/3340. Doesn’t need configuration. (Would track/rework in https://github.com/meltano/meltano/issues/3340 as a dependency for external backends discussed here.)

Second PR. Add second state backend, for instance based on PyFilesystem or smart_open. Should include method of configuring. (Can use this issue, aka https://github.com/meltano/meltano/issues/5981.)

100%, that’s exactly the plan.

Third PR. Add --from_backend and --to_backend support to meltano state copy|move CLI commands. (Not logged yet.)

Yeah, I think that makes sense as the next step and should be pretty straightforward to implement after these changes.

Top Results From Across the Web

Backend Type: s3 | Terraform - HashiCorp Developer

Stores the state as a given key in a given bucket on Amazon S3. This backend also supports state locking and consistency checking...

Terraform Backend Using S3 and DynamoDB With State ...

Step 3 - Create backend.tf file and add S3 bucket and DynamoDb table details. In the AWS management console and create an S3...

AWS Terraform S3 and dynamoDB backend - Angelo Malatacca

It's called Terraform Backend. In practice, it stores the terraform.tfstate file in an s3 bucket and uses a dynamoDB table for state locking ......

How to manage Terraform state - Gruntwork Blog

Go back to the Terraform code, add a remote backend configuration to it to use the newly created S3 bucket and DynamoDB table,...

Terraform backend using S3 and Dynamodb with state locking

In this video, I'm going to demonstrate how to set up a Terraform backend using S3 and Dynamodb with state locking in 10...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

feat: Add support for non-DB state backends (`s3`, `dynamodb`, etc.)

Additional context:

Workarounds:

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Update terminal environment variable precedence to be in line with currently xfail-ing tests

Add from Hub cant find the proper file bundle