feature: Offer a method to clean up old states in meltano.db
See original GitHub issueFeature scope
API
Description
Our Meltano project has a relatively large state (like 20MB), so the size of meltano.db
grows quickly. To my best understanding, currently the StateService
offers no way to remove old run/job records and I haven’t found a clue in the config for any housekeeping, so we probably have to work around by SQLiting into the DB and delete them by ourselves.
It would be nice if we can do this in an official way.
Can we possibly:
- Have a CLI command to clean up old states? and/or
- Config meltano for housekeeping, cleaning up old states at some point?
IMO, the feature makes sense even for normal-sized states, for we eventually need to deal with evergrowing runs
table.
Issue Analytics
- State:
- Created a year ago
- Comments:6
Top Results From Across the Web
Command Line - Meltano Documentation
Meltano provides a command line interface (CLI) that makes it easy to manage your project, plugins, and EL(T) pipelines.To quickly find the meltano...
Read more >New `meltano state` command to rename, alter, print ... - GitLab
As a workaround, we can go into the Meltano database and mess with the job table, ... meltano state clear <JOB-ID> (or meltano...
Read more >meltano/CHANGELOG.md at main - GitHub
#6694 Regression in meltano remove with error stating the plugin is missing the ... #6610 Replace StaleJobFailer class with fail_stale_jobs function ...
Read more >Meltano — build a tap: from zero to hero in 10 minutes - Medium
The data market is very hot these days. Among all other tools, personally see the tool called Meltano as a rising star. Why?...
Read more >Data Stacks For Fun & Nonprofit — Part III | by Andrew Stewart
The Postgres database can be swapped out for other database engines such as ... Meltano offers Singer taps for sources ranging from Google ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I made a PR #6823 to resolve this issue. I’m sharing more context here:
Our use case and the motivation behind this feature request
We are maintainers of target-miso. Many of users’ projects involve handling updates and hard deletes from sources that don’t support log-based replication method, and we solve the issue by a full table method combined with record hash comparison stored in the state. This is really powerful pattern with the only caveat of growing size of
meltano.db
. This feature helps us keep the size (rather) bounded.How we work around at the moment
A cronjob which deletes old runs using SQLite, keeping a fixed number of latest runs per state.
@simonpai thanks for the issue!
There’s an argument for not offering a method to clean up the state DB because it’s potentially useful data for understanding past runs. But I understand the problem here and would be supportive of adding a
meltano state remove/delete
command. By default it could delete everything but the past X runs but I could see that being configurable as well.We won’t be prioritizing this ourselves in the near term but would happily take a PR!