feat: Add support for non-DB state backends (`s3`, `dynamodb`, etc.)
See original GitHub issueSpec discussion:
I think we could introduce state backends such as s3
, dynamodb
, and other backends that have better reliability than an RDBMS and near-zero always on cost.
In the future, a Meltano-managed state offering, similar to Pulumi’s default experience.
_Originally posted by @aaronsteers in https://github.com/meltano/meltano/issues/2520#issuecomment-1145062214_
Additional context:
- Currently a ‘current’ STATE is a composite on-demand scan through history records. This is not ideal in general - and we have #3340 logged to refactor this so a single table row would be the “backend” to read and write from.
- Moving to a generic backend store would likely require also solving for #3340 - or at least the same refactoring would (likely) be needed in both cases to eliminate the need for scanning history logs.
Workarounds:
#2520 talks about potential workaround, but basically the current workaround is to:
meltano state get...
to pull the latest state into a file.- upload that file to S3 before the container is deleted.
- when next run, download the file from S3 and load to the systemdb with
meltano state set ...
This works with our without a postgres or other long-lived rdmbs, since the built-in sqlite implementation is created on the fly if no postgresdb is specified, and the process above essentially just removes the long-term state storage requirement from the sqlite backend.
Issue Analytics
- State:
- Created a year ago
- Reactions:3
- Comments:12 (3 by maintainers)
Top Results From Across the Web
Backend Type: s3 | Terraform - HashiCorp Developer
Stores the state as a given key in a given bucket on Amazon S3. This backend also supports state locking and consistency checking...
Read more >Terraform Backend Using S3 and DynamoDB With State ...
Step 3 - Create backend.tf file and add S3 bucket and DynamoDb table details. In the AWS management console and create an S3...
Read more >AWS Terraform S3 and dynamoDB backend - Angelo Malatacca
It's called Terraform Backend. In practice, it stores the terraform.tfstate file in an s3 bucket and uses a dynamoDB table for state locking ......
Read more >How to manage Terraform state - Gruntwork Blog
Go back to the Terraform code, add a remote backend configuration to it to use the newly created S3 bucket and DynamoDB table,...
Read more >Terraform backend using S3 and Dynamodb with state locking
In this video, I'm going to demonstrate how to set up a Terraform backend using S3 and Dynamodb with state locking in 10...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There have been some synchronous discussions on this–documenting the results of those here.
The v1 of this feature is going to be to use a third-party library (likely smart_open) that will allow us to support users configuring state backends in the form of a simple URI, e.g.
s3://some_bucket/some_prefix
, where both partial and complete state files can be written. This means we’ll also need to implement some barebones locking mechanism. Locking doesn’t need to be too sophisticated at first because running the same pipeline using the same backend concurrently in separate deployments should be a pretty rare use case, and those users can still use the existing system db state backend to get more deterministic behavior. Plus, themeltano state
commands allow users to manually edit, clear, copy, or merge state to fix any issues that arise from concurrent runs.Creating state backends is going to require decoupling state from job history, so we’ll need to tackle https://github.com/meltano/meltano/issues/3340 before getting started on the actual state backend implementation.
This implementation will be done in such a way as to lay the foundation for user contributed state backend plugins in some future iteration, but “pluggable” backends are out of scope for the time being. The URI approach solves for a huge number of use cases and takes us one step closer to eliminating the need for a postgres backend in production deployments without the heavy lift of supporting arbitrary plugins.
@aaronsteers –
100%, that’s exactly the plan.
Yeah, I think that makes sense as the next step and should be pretty straightforward to implement after these changes.