Binary serialization: simplified variable-length integers
See original GitHub issueI would like to propose an option to enable an alternative, simplified encoding scheme for variable-length integers.
A bit of context: I’m working on a binary client for PHP, and it took me well over a week of after-hours tinkering to get a working implementation of functions to read/write variable-size integers. The implementation is rather slow, with no way to optimize, since PHP only has one, platform-dependent, signed 32-bit or 64-bit integer type. It’s also brittle, since there’s no practical way to support values larger than 2 billion on a 32-bit PHP build.
Having a simplified, alternative encoding option for variable-length integers isn’t just to simplify implementation in scripting languages though - a more important thing to consider, is the trade-off between bandwidth overhead and CPU overhead.
For example, in a replication scenario, where two servers are physically far from each other, conserving bandwidth may be the most desirable option.
On the other hand, in a scenario where an application server is on the same local network as the database server, saving CPU overhead from encoding on the database server, and CPU overhead from encoding/decoding on the application server (especially if this is running a client written in a scripting language like PHP or JS) a bit more bandwidth overhead may be a better trade-off.
I would propose a very simple encoding scheme:
(type:byte)(value:bytes)
Where type
is e.g. 1, 2, 3 or 17 (integer, short, long or byte) and the following value
is a variable number of bytes depending on the type of integer.
For values < 127, that means an extra byte of overhead. For values 127-255, it’s two bytes, same as now, and for higher values it’s a byte more or the same as the current scheme.
Computationally though, this should be much simpler and faster to encode and decode - and much simpler to implement.
I’d be curious to see a bandwidth benchmark with this change in place in a real-world scenario - in practice, the difference in bandwidth may be almost negligible, and defaulting to a simplified encoding may actually be the best choice, slow networks being the only reasonable exception.
Issue Analytics
- State:
- Created 9 years ago
- Comments:41 (26 by maintainers)
Top GitHub Comments
I come to the same conclusion after years of experience with orientdb and the drivers. We need a more robust and standardized Response/Request protocol. I recommend http://www.grpc.io/ it is available for tons of languages and it would give orientdb a big boost in the right direction.
It would solve fundamental problems:
What are you going to do with all these problems with the driver orientdb? It would be very motivated if you could release some informations about the future. @luigidellaquila @tglman thanks!
That’s too bad.
I am also pushing for a more robust driver standard (with the little influence I have). I am no expert, but I’ve been reading up on both gRPC and msgpack and both sound like a much better basis to work with than what is currently available, simply because they are already well supported and documented in many languages.
I too would like to know how much work would be involved to switch to one of these quasi-standards. It has to be what I would call an enabling step for ODB, because currently, all the driver support is still on wobbly legs. It must allow the actual drivers to be programmed much easier you would think. And for sure, it is the language driver support that makes or breaks things like a database.
Scott