Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Binary serialization: simplified variable-length integers

See original GitHub issue

I would like to propose an option to enable an alternative, simplified encoding scheme for variable-length integers.

A bit of context: I’m working on a binary client for PHP, and it took me well over a week of after-hours tinkering to get a working implementation of functions to read/write variable-size integers. The implementation is rather slow, with no way to optimize, since PHP only has one, platform-dependent, signed 32-bit or 64-bit integer type. It’s also brittle, since there’s no practical way to support values larger than 2 billion on a 32-bit PHP build.

Having a simplified, alternative encoding option for variable-length integers isn’t just to simplify implementation in scripting languages though - a more important thing to consider, is the trade-off between bandwidth overhead and CPU overhead.

For example, in a replication scenario, where two servers are physically far from each other, conserving bandwidth may be the most desirable option.

On the other hand, in a scenario where an application server is on the same local network as the database server, saving CPU overhead from encoding on the database server, and CPU overhead from encoding/decoding on the application server (especially if this is running a client written in a scripting language like PHP or JS) a bit more bandwidth overhead may be a better trade-off.

I would propose a very simple encoding scheme:

(type:byte)(value:bytes)

Where type is e.g. 1, 2, 3 or 17 (integer, short, long or byte) and the following value is a variable number of bytes depending on the type of integer.

For values < 127, that means an extra byte of overhead. For values 127-255, it’s two bytes, same as now, and for higher values it’s a byte more or the same as the current scheme.

Computationally though, this should be much simpler and faster to encode and decode - and much simpler to implement.

I’d be curious to see a bandwidth benchmark with this change in place in a real-world scenario - in practice, the difference in bandwidth may be almost negligible, and defaulting to a simplified encoding may actually be the best choice, slow networks being the only reasonable exception.

Issue Analytics

State:
Created 9 years ago
Comments:41 (26 by maintainers)

Top GitHub Comments

1reaction

StarpTechcommented, Aug 14, 2016

I come to the same conclusion after years of experience with orientdb and the drivers. We need a more robust and standardized Response/Request protocol. I recommend http://www.grpc.io/ it is available for tons of languages and it would give orientdb a big boost in the right direction.

It would solve fundamental problems:

Streaming support
Standardized Status Codes
Standardized protocol
Performance
Payload Agnostic

What are you going to do with all these problems with the driver orientdb? It would be very motivated if you could release some informations about the future. @luigidellaquila @tglman thanks!

0reactions

smolinaricommented, Aug 14, 2016

That’s too bad.

I am also pushing for a more robust driver standard (with the little influence I have). I am no expert, but I’ve been reading up on both gRPC and msgpack and both sound like a much better basis to work with than what is currently available, simply because they are already well supported and documented in many languages.

I too would like to know how much work would be involved to switch to one of these quasi-standards. It has to be what I would call an enabling step for ODB, because currently, all the driver support is still on wobbly legs. It must allow the actual drivers to be programmed much easier you would think. And for sure, it is the language driver support that makes or breaks things like a database.

Scott

Top Results From Across the Web

Binary serialization | Microsoft Learn

Serialization is the process of storing the state of an object to a storage medium. In binary serialization, the public and private fields...

c++ - Binary serialization of variable length data and zero ...

What I want is, a fixed size structure + variable length part, so that serialization can be expressed in simple and less error...

Data Encoding - GitLab

This module provides type-safe serialization and deserialization of data structures. ... In binary, data is encoded as a variable length sequence of bytes, ......

Encoding | Protocol Buffers - Google Developers

Deterministic serialization only guarantees the same byte output for a particular binary. The byte output may change across different versions ...

Binary Serialization - Yazan Diranieh

Once object have been serialized, it is quite easy to perform many other functions such as compression, encryption, and so on. Persistent Storage....