Performance Tuning for MySql
See original GitHub issueBackground Just a tracker issue for doing some performance tuning for mysql. Looking to capture notes on this card along the way.
As we move all repositories to mysql, it is imperative to understand the operational characteristics of MySQL especially as it pertains to vinyldns.
The critical aspects from a size and performance standpoint are as follows.
Bulk inserting for record set and record changes When a zone is loaded into VinylDNS for the first time, we run a AXFR full zone transfer and store all record sets into the vinyldns system. We also store a corresponding record change for each record set at the same time.
For most zones, performance is a non-issue. However, when zones have 100,000s or 1MMs of record sets in them, this operation can be understandably expensive.
Archiving of record changes With DynamoDB, it is relatively cheap to store long-term all data in the record change table. However, with MySQL, disk space is limited wrt DynamoDB. We need to have a plan for archiving record changes, potentially configuration driven, with background jobs / processes / stored procedures and triggers that will move data past a certain age into some kind of long-term backup.
This may not be necessary for all vinyldns users, as it depends on data size.
Query performance for record set and record changes VinylDNS installations may have 1MMs of record sets and record changes. For example, we anticipate Comcast to grow to 100MMs of record sets and 100GB of record changes. For this size, it is imperative to have an archiving strategy for old record changes.
Indexing and query performance will be critical to the responsiveness of the application. We have to fine tune indexes and understand query execution plans for record and record change data.
Notes
Bulk Insert MySql.
Bulk insert in MySQL is achieved using sql INSERT INTO table(a, b, c) VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?)
The SQL shows that you can keep appending additional rows under the VALUES clause seemingly indefinitely.
The number of rows you can insert in a single query is depending on a number of factors, crucially the max_allowed_packet
There are a number of ways to do bulk insert using mysql, including file loading. High-speed inserts with MySQL covers the different ways to bulk load data. Key takeaway: extended inserts achieve 201,000 inserts per second with 26 bytes per row average
Proposed recordset table
CREATE TABLE recordset (
id CHAR(36) NOT NULL,
zone_id CHAR(36) NOT NULL,
name VARCHAR(256) NOT NULL,
fqdn VARCHAR(256) NOT NULL,
type TINYINT NOT NULL,
data BLOB NOT NULL,
PRIMARY KEY (id),
INDEX zone_id_name_index (zone_id, name, type),
INDEX fqdn_index (fqdn, type)
);
Inspecting the record set table, we have the following field sizes:
id- ascii - CHAR(36) - 1 byte per character = 36 bytesname- ascii - VARCHAR(256) - 1 byte per character = 36 bytesfqdn- ascii - VARCHAR(256) - 1 byte per character = 36 bytestype- TINYINT - 1 byte per entry = 1 bytedata- BLOB - the protobuf byte array of the record set. Currently, this is roughly 150 bytes
IMPORTANT: This makes each record in the record set table ~260 bytes
For example, sending 1000 inserts in a batch would be approximately 260,000 bytes of data.
max_allowed_packet
The MAX max_allowed_packet in MySQL is presently 1GB. That size is seemingly untenable to work with.
If we were to work with a 4MB packet size, we should be able to batch 10,000 records per query with lots of head room (10,000 records x 260 bytes = 2.6MB). Headroom is necessary in the event we need to add more fields, or if we max out all of the field sizes in a row.
CONCLUSION The performance of MySQL at our sizes (100MMs of records) and query access patterns seems reasonable. It also gives us the added benefit of easier querying of SQL.
The downside comes with locking / transactions. VinylDNS is built with an eventually consistent / last-write-wins model for persistence. We are happy to overwrite records in the database. Unfortunately, there is no way to accomplish this with InnoDB for MySQL without the additional overhead of locking that comes.
Insert Testing At very large data sizes (250MM records, 250MM record set changes, 2MM zones) we were able to sustainable achieve insert throughput of 2,200 record sets + changes per second. Subsequent testing revealed that could be as high as 3,000 RPS. Given that large zones enter VinylDNS sporadically, 2,200 RPS is more than sufficient to meet our needs.
Query Testing At large data sizes (100MM record sets), all record set queries consistently returned in very low millisecond resolution across all query types. Assuming that indexes are set up properly, queries should continue to be responsive.
Changes needed
- DynamoDB is seemingly “unbounded” in data sizes and memory; whereas, MySQL is not. To fully migrate to MySQL, we must have an archiving strategy for old zone and record changes.
- Support a separate writer and reader connections to MySQL. Aurora specifically supports this option automatically, and it would allow us to run lock-free queries while contention happens in the writer endpoint.
Changes recommended
- Zone Loading currently just follows a zone sync process to load zones into the database. Converting DNS zone data via AXFR to record sets takes up a lot of memory. We should stream via FS2 the changes into the database instead, avoiding memory pressure.
- Zone syncing is extremely expensive from a memory perspective. We load all record sets from VinylDNS, as well as all record sets from the DNS AXFR, and perform a diff merge process. If we have very large zones approaching 1MM records, we may run out of memory. A different process for doing zone syncing should be invented (requires design) that can do the same diff merge process but in a more memory safe way.
TABLED DISCUSSION
- We should have a separate process for initial loading of zones other than zone syncing. This process can be built using FS2, where we stream / chunk records and record changes in. This will keep the amount of memory used during loading of large zones well managed. make this an issue!
- Have to figure out how to better do zone syncing for large zones. Presently, we load all of our records out of the database into an in-memory data structure, load all of the DNS records via a AXFR, and do a diff/merge process on the two. We maybe able to accomplish a better version with intelligent record lookup from vinyldns, mitigating the need to do an entire bulk load for zone syncing process. this should probably be done!
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (14 by maintainers)

Top Related StackOverflow Question
Local testing
Results 4 seconds!
Query Results - on server The query results running in open stack had a marked improvement over those running on local machine over VPN.
Will not include all samples, but most query response times are in very low millisecond.