question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feature: Offer a method to clean up old states in meltano.db

See original GitHub issue

Feature scope

API

Description

Our Meltano project has a relatively large state (like 20MB), so the size of meltano.db grows quickly. To my best understanding, currently the StateService offers no way to remove old run/job records and I haven’t found a clue in the config for any housekeeping, so we probably have to work around by SQLiting into the DB and delete them by ourselves. It would be nice if we can do this in an official way.

Can we possibly:

  1. Have a CLI command to clean up old states? and/or
  2. Config meltano for housekeeping, cleaning up old states at some point?

IMO, the feature makes sense even for normal-sized states, for we eventually need to deal with evergrowing runs table.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
simonpaicommented, Oct 2, 2022

I made a PR #6823 to resolve this issue. I’m sharing more context here:

Our use case and the motivation behind this feature request

We are maintainers of target-miso. Many of users’ projects involve handling updates and hard deletes from sources that don’t support log-based replication method, and we solve the issue by a full table method combined with record hash comparison stored in the state. This is really powerful pattern with the only caveat of growing size of meltano.db. This feature helps us keep the size (rather) bounded.

How we work around at the moment

A cronjob which deletes old runs using SQLite, keeping a fixed number of latest runs per state.

1reaction
tayloramurphycommented, Sep 20, 2022

@simonpai thanks for the issue!

There’s an argument for not offering a method to clean up the state DB because it’s potentially useful data for understanding past runs. But I understand the problem here and would be supportive of adding a meltano state remove/delete command. By default it could delete everything but the past X runs but I could see that being configurable as well.

We won’t be prioritizing this ourselves in the near term but would happily take a PR!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Command Line - Meltano Documentation
Meltano provides a command line interface (CLI) that makes it easy to manage your project, plugins, and EL(T) pipelines.To quickly find the meltano...
Read more >
New `meltano state` command to rename, alter, print ... - GitLab
As a workaround, we can go into the Meltano database and mess with the job table, ... meltano state clear <JOB-ID> (or meltano...
Read more >
meltano/CHANGELOG.md at main - GitHub
#6694 Regression in meltano remove with error stating the plugin is missing the ... #6610 Replace StaleJobFailer class with fail_stale_jobs function ...
Read more >
Meltano — build a tap: from zero to hero in 10 minutes - Medium
The data market is very hot these days. Among all other tools, personally see the tool called Meltano as a rising star. Why?...
Read more >
Data Stacks For Fun & Nonprofit — Part III | by Andrew Stewart
The Postgres database can be swapped out for other database engines such as ... Meltano offers Singer taps for sources ranging from Google ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found