Unusually large snapshots size
See original GitHub issueCrateDB version: v3.3.4
Environment description: JVM version openjdk version “11.0.4” 2019-07-16 OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3) OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3, mixed mode, sharing)
Kernel Linux chiphub 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Distribution 18.04.3 LTS (Bionic Beaver)
Problem description:
I have setup a python script which creates a crate db snapshot everyday at noon. The query I ran to initially setup the repo is:
CREATE REPOSITORY repo_name TYPE FS WITH (LOCATION='/path/to/folder', compress=true);
The query I run everyday in order to create the snapshot is:
CREATE SNAPSHOT repo_name.{} ALL WITH (wait_for_completion=true, ignore_unavailable=true);
On the initial run, the snapshot directory size was same as the database size (30GB).
After about a month, the database has grown to 40GB while the snapshot directory size has grown to ~120GB (almost thrice the size of the database!).
Is this normal?
If yes, are there any options/optimizations I can try out to reduce the size of the snapshots?
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
Hey @marregui @mfussenegger,
Apologies for the delayed response. Thanks a lot for your inputs. I’ll fine-tune my python script to delete the prior snapshots once a new one is created as suggested and see how it goes. Will reopen in case I run into the issue again.
Regards, Kunal
To append on what @marregui already mentioned:
Snapshots are incremental and if you keep all the snapshots you could also restore an earlier snapshot, that’s why it needs to keep the data around.
Instead of creating a new repository an alternative is to use
DROP SNAPSHOT
on older snapshots, so that files that become unreferenced can be cleaned up.