csvs-to-sqlite 2.0: dropping Pandas in favour of sqlite-utils
See original GitHub issueMy sqlite-utils library has evolved to the point where I think it would make a good foundation for the next version of csvs-to-sqlite
.
The main feature I’m excited about here is being able to handle giant CSV files - right now they have to be loaded into memory by Pandas, but sqlite-utils has similar functionality which handles them as streams, reducing the amount of memory needed to consume a huge file.
I intend to keep as much of the CLI API the same for the new version, but this is a big change so it’s likely some cases will break. As such, I intend to keep the 1.x
branch around (and maintained with bug fixes) for users who find that 2.0 doesn’t work for them.
I’ll pin this issue for a few weeks so people can comment on this plan before I start executing.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:6 (2 by maintainers)
Top GitHub Comments
If your needs are simple - just loading a single CSV file - then yes, I’d recommend
sqlite-utils
instead - it has better performance as it works by streaming files rather than loading them all into memory.csvs-to-sqlite
is still better if you want to transform a folder full of files.csvs-to-sqlite
still seems great for dumping a large number of (not too huge) CSVs into a single db (sqlite) file.For example, if you have a directory or subdirectories of CSVs that you want to bundle together:
AFAIK,
sqlite-utils
would need you to write additional code to handle more than 1 CSV (or TSV/JSON) file at a time.