Shall we move from pickles to SQLite3 for local data storage?
See original GitHub issueWe’re currently running all kinds of local storage on Python pickles. While there hasn’t been significant problems with them, there’s some room for improvement.
I came up with the idea of employing a light-weight, easy-to-use DB (I vote for SQLite 3). This way we control the local storage better since there’s only one .db
file, SQLite 3 is also easy to setup - no setup needed actually 😃.
We aren’t going to make it too complex - one table may contain only 3 or so columns. As I initially thought, we may be using only SELECT
, INSERT
, UPDATE
and DELETE
- Smokey’s data isn’t any complex at all, not even JOIN
or such queries, making this easy for people with virtually zero DB knowledge to maintain (I’m one of them).
Benefits we’re getting:
- Less RAM usage
- Easier data inspection and maintenance (use
sqlite3
CLI tool) - Easier backup, migration, adding new stored content (like logs), and intra-instance transfer
- Potentially less disk I/O (dubious)
- No more pickling/unpickline errors, unless the whole
.db
file is corrupt, which happens way less than a pickle file corrupting - Potentially easier migration to Helios (if it’s still alive)
What we’re paying for this:
- More frequent disk I/O
- Potential difficulty in expanding columns
- Potential performance degradation (disk is always slower than RAM) I don’t this one will be very much - the majority of CPU are spent running regexes, and the majority of idle time are spent waiting network responses. In case a server has a slow disk, this would be an issue. Otherwise, not much.
An early draft of an example is in the db
branch that contains the infrastructure, as well as migrated blacklisted users from pickles to DB. CI is passed and it can be safely merged now.
Is it a good idea?
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (14 by maintainers)
Top GitHub Comments
To the extent that this adds benefits, those seem to be shadowed by the benefits that Helios would bring. So let’s try to revive the Helios branch instead, shall we?
Chiming in on the Helios points:
I’d love to move over to Helios. It’s close to a year since the initial framework was set up and that’s where it (mostly) died. Interest in converting to it seems to have come to a halt. I have no problem with this - it is a rather large change (and even more so now that my Smokey branch is a year out of date), it requires changes to both Smokey and Metasmoke and effectively adds a third peg on which the entire solution stands on.
If we aren’t moving to a cloud based database (aka. Helios), I am fully in support of moving away from pickles (and config data, but I already argued that point too). @Undo1 is right that there is some synchronization issues, but they are going to be related to the config data I mentioned. In terms of keeping the blacklists in sync, I don’t think we’d see anything different than we do today.
@iBug Something that would be helpful to see is a comparison in terms of file sizes. SQLite has the vacuum command that can optimize space, but I don’t know how well it works, especially after adding/removing “a lot” (trademark pending) of entries. How does this single database compare in terms of on disk usage and in terms of memory usage when loading data into memory.