Serialization layer as serious bottleneck
See original GitHub issueWe have been investigating the performance of our code node. Among a great many thing we managed to optimized and achieved some “ok-ish” numbers. A closer look now revealed that about 80% of the performance now goes away in the serialization layer. This is kind of unexpected as I would rather have expected the database, hashing and asymetric crypto to be the main bottlenecks. The situation is aggrevated by the fact that every transaction has a transaction id. This in turn is computed as the hash of all its elements (input states, notaries, output states, time windows, etc.), which each triggers a serialization again.
For testing purposes to get a closer view we made use of:
private static SerializedBytes serialize(Object object) {
SerializationFactory defaultFactory = SerializationFactory.Companion.getDefaultFactory();
SerializationContext defaultContext = defaultFactory.getDefaultContext();
return defaultFactory.serialize(object, defaultContext);
}
and serialized a single state with about two dozen fields. A resulting byte array was 3869 bytes long. One CPU core managed to serialize 2800 of those objects every second. If we assume that a great many objects are part of a transaction, then the pictures gets clearer why it takes this amount of time.
To give a reference, we serialized the same object with ObjectMapper from Jackson by first constructing a writer for the desired state type and then measured performance serializing that state object. Jackson managed to serialize 99500 objects every second. A factor 40 compared to AMQP. The json length of the result was 1065. I consider JSON rather ineffient but managed to be 75% smaller than AMQP while still being “standalone” not requiring a external model to deserialize. ProtoBuffer and friends would be another order of magnitude, but at the cast of an external model.
When looking at it with a profiler, ones sees:
There is heavy work needed to serialize a great number of DescribedTypeElement. A closer look at the implemention shows for example:
val data = Data.Factory.create()
data.withDescribed(Envelope.DESCRIPTOR_OBJECT) {
withList {
writeObject(obj, this, context)
val schema = Schema(schemaHistory.toList())
writeSchema(schema, this)
writeTransformSchema(TransformsSchema.build(schema, serializerFactory), this)
}
}
....
as a first measure might be to cache the serialization of the schema part to directly get the byte array from a given cached schema history. Maybe providing a decent speed-bump. For a database perspective it may also would proof worthwile to seperate the data and the model seperately, avoiding the redudant storage of the model part.
For Corda applications to move towards more high-volume applications, this ticket feels rather important. Alternatively it would kind of also be nice to see plain JSON support (or something similar). There is widespread support across all devices, easy to read/write, standards how to compute a signature and very performant implementations.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:9 (7 by maintainers)
first draft in the commit above, 8x the performance
I took a deeper look at it in order to fix it. We expect a few ten million records, so performance is critical.
There is a “quick” solution to just make things faster by caching the schema. What I managed to do is instead of serializing the schema, I only put a small player holder into the data structure:
SerializationOutput:
And in a second step I can patch the AMQP structure with the real schema. There are four main elements:
It is rather straighforward to find the right place and do that replacement. In our first use case the schema makes up 90% of the serialized bytes. This in turn safes about a factor of 10x of serialization work (with some new, simple array manipulations). As a minor catch, one further has to patch the AMQP
ListElement
which holds the total size of all its data, which changes due to the placeholder. As further minor complication AMQP stores the length of the schema in the serialized output, which in turn has a variable length coding depending on whether it is larger or smaller than 255 bytes.Compared to Jackson it will still a bit slow should I gain that factor 10x. But it is not so suprising. For example, that encoding of the size by AMQP makes things more complicate. It has to traverse the complete object graph to compute the size of subelements, making that part similarly expensive as the “real” serialization.
If there is an interest in a PR, I could do that, I’m close to finishing it up for our use case. I’m carefully optimistic to achieve about 1000 tps on a older eight core machine (with further other optimizations) rivaling those official 32/64-core numbers.
But general question is where to move in this area. The serialization mechanism is inefficient and taking a huge amount of space in the database, so a split of model and data would be desirable, at least for storage. So maybe the manipulation above could be a starting point for that as well. Or of course the possiblity to support someting else like JSON (maybe even https://www.w3.org/TR/vc-data-model/ to allow interaction with other systems). Since all this impacts both long-term storage and the interaction with non-corda systems/clients, IMHO simplicity could be an important characteristic. Understanding and replicating all the AMQP things is rather challening and support beyond Java very limited. If there is an interest for JSON prototype PR, may would find time as well. I’m not quite sure if this will ever be an option or more a “hell will first freeze over” scenario 😃 since there has been quite some investiment in AMQP for serialization. For sure it would have to complement the existing serialization rather than replace it.