Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unusually large snapshots size

See original GitHub issue

CrateDB version: v3.3.4

Environment description: JVM version openjdk version “11.0.4” 2019-07-16 OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3) OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3, mixed mode, sharing)

Kernel Linux chiphub 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Distribution 18.04.3 LTS (Bionic Beaver)

Problem description:

I have setup a python script which creates a crate db snapshot everyday at noon. The query I ran to initially setup the repo is: CREATE REPOSITORY repo_name TYPE FS WITH (LOCATION='/path/to/folder', compress=true); The query I run everyday in order to create the snapshot is: CREATE SNAPSHOT repo_name.{} ALL WITH (wait_for_completion=true, ignore_unavailable=true); On the initial run, the snapshot directory size was same as the database size (30GB). After about a month, the database has grown to 40GB while the snapshot directory size has grown to ~120GB (almost thrice the size of the database!). Is this normal? If yes, are there any options/optimizations I can try out to reduce the size of the snapshots?

Issue Analytics

State:
Created 4 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

kunz07commented, Jan 17, 2020

Hey @marregui @mfussenegger,

Apologies for the delayed response. Thanks a lot for your inputs. I’ll fine-tune my python script to delete the prior snapshots once a new one is created as suggested and see how it goes. Will reopen in case I run into the issue again.

Regards, Kunal

1reaction

mfusseneggercommented, Jan 15, 2020

To append on what @marregui already mentioned:

Snapshots are incremental and if you keep all the snapshots you could also restore an earlier snapshot, that’s why it needs to keep the data around.

Instead of creating a new repository an alternative is to use DROP SNAPSHOT on older snapshots, so that files that become unreferenced can be cleaned up.