[CEP] Partition synclog data into multiple databases
See original GitHub issueAbstract
Synclog records are currently stored in a single database. A new record is created for every sync request and past records for a user, app_id, device_id
are deleted after a form submission has been received.
To allow spreading this data over more hardware resources this CEP proposes a partitioning schema that will allow the data to reside in different locations based on the user_id
field.
Motivation With 600K users ICDS has reached the limits of what a single DB VM can support without further vertical scaling. Considering that this load is not even half of full scale it makes sense to look at alternatives to vertical scaling.
Specification Unlike case and form data which are partitioned by document ID, synclogs can be partitioned by the user ID. There are two reasons for doing this:
- There are some queries where only the user ID is known. Partitioning by user ID makes these queries efficient since they can be run on a single shard.
- Having the data for a user co-located makes it easy to prune old data.
Analysis of the usages of synclogs shows that all queries that appear outside of test code can be made to include the user ID (or don’t need to in the case of aggregate queries).
The partitioning and query routing will be done in Django so no routing DB (e.g. plproxy) will be required. \
The data will continue to be kept separate in it’s own set of databases. This allows the synclog cluster to scale independently and avoids adding more load to the OLTP cluster which is already under high load.
The data will continue to be stored in PostgreSQL databases.
Impact on users None
Impact on hosting The implementation will be done so that by default all of the logical shards exist in a single database which will be functionally the same as the current setup. This will mean that there will be no changes for 3rd parties hosting CommCare.
Backwards compatibility None
Release Timeline ICDS is targeting mid February 2020 to release this.
Open questions and issues
- Will this require another plproxy node?
- No. The access patterns allow virtually all queries to be directed to a single node. This routing can be done from Django directly.
- Why not use the current sharded database instead of adding another one?
- Keeping this data separate avoids adding more load on the the current OLTP databases.
- Why not move to MongoDB (or some other NoSQL database)
- Introducing a completely new database will have additional cost implications on the team in terms of setup, maintenance etc. There is also no reason to move since the current technology is working well.
- Why not move to CitusDB?
- It would require an additional VM for the controller node.
- Future expansion would require purchasing an enterprise license (or developing our own tooling).
- The simplicity of the synclog data access patterns does not warrant the complexity of CitusDB.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:6
- Comments:8 (8 by maintainers)
Top GitHub Comments
Disk usage dropped to 200GB:
I’m going to leave this for now and implement daily repack instead. I’m going to leave this open for now though and revisit later.
@sravfeyn it is both data size and request load. I had meant to include some more analysis which I’ve now done:
Data size is currently 3.5T. While doing the analysis I realised that the size of data does not nearly match up with the size of the records and the number of records. This does make sense since we do delete a lot of records but the storage space is only reclaimed by doing a VACUUM FULL which we don’t do.
I tested using pg_repack on one table and it dropped ffom 650GB to 5GB which is very promising. I’m going to repack the remaining tables which should reduce the disk size dramatically and hopefully also the performance. I’ll leave this open for now but it may not be necessary after all.