question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feat: Add support for non-DB state backends (`s3`, `dynamodb`, etc.)

See original GitHub issue

Spec discussion:

I think we could introduce state backends such as s3, dynamodb, and other backends that have better reliability than an RDBMS and near-zero always on cost.

In the future, a Meltano-managed state offering, similar to Pulumi’s default experience.

_Originally posted by @aaronsteers in https://github.com/meltano/meltano/issues/2520#issuecomment-1145062214_

Additional context:

  1. Currently a ‘current’ STATE is a composite on-demand scan through history records. This is not ideal in general - and we have #3340 logged to refactor this so a single table row would be the “backend” to read and write from.
  2. Moving to a generic backend store would likely require also solving for #3340 - or at least the same refactoring would (likely) be needed in both cases to eliminate the need for scanning history logs.

Workarounds:

#2520 talks about potential workaround, but basically the current workaround is to:

  1. meltano state get... to pull the latest state into a file.
  2. upload that file to S3 before the container is deleted.
  3. when next run, download the file from S3 and load to the systemdb with meltano state set ...

This works with our without a postgres or other long-lived rdmbs, since the built-in sqlite implementation is created on the fly if no postgresdb is specified, and the process above essentially just removes the long-term state storage requirement from the sqlite backend.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:3
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
cjohnhansoncommented, Aug 11, 2022

There have been some synchronous discussions on this–documenting the results of those here.

The v1 of this feature is going to be to use a third-party library (likely smart_open) that will allow us to support users configuring state backends in the form of a simple URI, e.g. s3://some_bucket/some_prefix, where both partial and complete state files can be written. This means we’ll also need to implement some barebones locking mechanism. Locking doesn’t need to be too sophisticated at first because running the same pipeline using the same backend concurrently in separate deployments should be a pretty rare use case, and those users can still use the existing system db state backend to get more deterministic behavior. Plus, the meltano state commands allow users to manually edit, clear, copy, or merge state to fix any issues that arise from concurrent runs.

Creating state backends is going to require decoupling state from job history, so we’ll need to tackle https://github.com/meltano/meltano/issues/3340 before getting started on the actual state backend implementation.

This implementation will be done in such a way as to lay the foundation for user contributed state backend plugins in some future iteration, but “pluggable” backends are out of scope for the time being. The URI approach solves for a huge number of use cases and takes us one step closer to eliminating the need for a postgres backend in production deployments without the heavy lift of supporting arbitrary plugins.

1reaction
cjohnhansoncommented, Sep 2, 2022

@aaronsteers

100%, that’s exactly the plan.

Third PR. Add --from_backend and --to_backend support to meltano state copy|move CLI commands. (Not logged yet.)

Yeah, I think that makes sense as the next step and should be pretty straightforward to implement after these changes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Backend Type: s3 | Terraform - HashiCorp Developer
Stores the state as a given key in a given bucket on Amazon S3. This backend also supports state locking and consistency checking...
Read more >
Terraform Backend Using S3 and DynamoDB With State ...
Step 3 - Create backend.tf file and add S3 bucket and DynamoDb table details. In the AWS management console and create an S3...
Read more >
AWS Terraform S3 and dynamoDB backend - Angelo Malatacca
It's called Terraform Backend. In practice, it stores the terraform.tfstate file in an s3 bucket and uses a dynamoDB table for state locking ......
Read more >
How to manage Terraform state - Gruntwork Blog
Go back to the Terraform code, add a remote backend configuration to it to use the newly created S3 bucket and DynamoDB table,...
Read more >
Terraform backend using S3 and Dynamodb with state locking
In this video, I'm going to demonstrate how to set up a Terraform backend using S3 and Dynamodb with state locking in 10...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found