Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: Use gRPC for the binary protocol

See original GitHub issue

Preface

gRPC is a official extension of protobuf. It uses flow control from HTTP2 to implement RPC with protobufs as a message serialization format.

In fact, this proposal is a variation of #3240 with a few key differences.

I tried to make it as short as possible, but it is a large topic to discuss. If you have any questions on one of the statements, feel free to ask for a full description or context.

Proposal

Use gRPC as an alternative/experimental protocol listener for OrientDB 2.x. There is not so much effort needed to implement it. There is also no need to replace current binary protocol entirely. Current binary protocol may be dropped later in 3.x branch, for example, if gRPC will prove the effort. I believe it definitely will.

Problems with current protocol

It’s tricky to implement binary protocol client for a new language.
Have to write everything from scratch: flow control, serialization, pooling.
Wrong protocol spec/documentation.
Protocol change may break existing implementations.
Hard to track protocol changes.

All these problems were already described here.

I will assume, that all of this is enough, to state that OrientDB needs some other protocol implementation.

Why use protobuf?

Industry standard.
Extendable.
Nearly any language have a library for it.
Declarative syntax for protocol (acts as a spec).
Generated code is efficient for each language (thanks to large number of contributors).

All the benefits were already described here.

But switching from current protocol serialization format to protobuf will solve only a half of the problem. It may require a huge effort to implement protobuf listener, as code from current protocol cannot be reused. So, we’ll have another custom protocol flow implementation which must be implemented for every language. This will not happen anytime soon, I believe.

Why use gRPC?

gRPC uses new HTTP2 standard for a reliable request-response flow control. HTTP2 is a binary protocol. It allows to create any number of request-response streams inside one HTTP2 connection. Each stream is being split into frames, so concurrent requests will not interfere with each other. That’s it - one connection can be used for any number of concurrent client requests. gRPC will automatically take care of connection failures. On a second layer, there is a protobuf-defined messages that are being passed over HTTP2. So, we have all messages described in a .proto files, like this:

message RID {
    sint64 clusterId = 1;
    sint64 clusterPos = 2;
}

message ReadRecordMetadataRequest {
    UserInfo info = 1;
    RID rid = 2;
}

We could generate message encoders/decoders from that files for any language using protoc (protobuf compiler)… But we can omit this step and let gRPC to generate entire client/server code that uses these messages for communication. Protoc starting from version 3 supports gRPC services definition:

service Storage {
    rpc ReadRecordMetadata(ReadRecordMetadataRequest) returns (ReadRecordMetadataResponse);
}

We can describe whole remote storage API using this format. Then, we need a single command to generate code for any language, that gRPC supports (C, C++, Java, Go, Node.js, Python, Ruby, Obj-C, PHP, C#).

But we will lose control over transport layer, right? Things like streamed result sets will not be supported at all? Wrong. In gRPC you can describe an API which relies on results and/or arguments streams:

service Storage {
    rpc ReadRecordMetadata(ReadRecordMetadataRequest) returns (stream ReadRecordMetadataResponse);
}

Code, that will be generated from this description will define ReadRecordMetadata API function that accepts one ReadRecordMetadataRequest object and returns back a stream of ReadRecordMetadataResponse objects.

As an additional feature, gRPC defines an authorization scheme with several builtin implementations: insecure (disabled), TLS certs, OAuth2. Optionally you can easily implement a new one, or just use insecure and implement auth on RPC level (as current binary protocol does).

The question is: why not to use such powerful tool?

Benefits

Standard protocol: protobuf over HTTP2.
API is described in .proto file, which can be used to generate both server and client implementations with that API.
Whole API can be described, including types, errors, enums, streaming, etc.
Easy to maintain for both server and client libs.
Framing: connections will not block on a large request/responses.
Streams: any number of request/responses over one connection. No need for pooling (HTTP2 will use pooling automatically if it is necessary).
Built in authentication support (TLS, OAuth2, bypass, custom). Can be managed on API level as well.
Protocol implementation is supported by a large community.

Drawbacks

Requires some effort to integrate (but a much less, than integration of protobuf itself!).
Performance can be affected on serialization level (may be neglectible, because user applications will run much faster due to streaming and framing of requests: one request will not affect others; also protobuf is pretty fast itself).
@phpnode also stated that code generation can be a drawback. But it allows to generate a faster code for some languages.
Anything else?

Implementation

I already described a half of binary protocol API in a terms of gRPC. I even implemented a gRPC-to-BinaryProtocol proxy to test the spec. But I have no experience in Java to fill PR with actual integration. Also there are a few questions with protocol, which I want to discuss, if you would like the idea in overall. For short, there is a few options:

Just describe BinaryProtocol API in gRPC. Use authentication on API level (Connect and OpenDatabase). Requests have no sessionId, as it is managed by HTTP2. Need to pass token to every request for client identification. API is not so convenient, but has a little effort for integration.
Optimize API for gRPC. Use native auth scheme (may implement some custom module for OrientDB tokens). Requests without tokens and sessionId. API is convenient, but requires some additional effort to integrate with OrientDB server.

Also I want to point, that were is no need to dispatch requests/responses in a manner described in previous proposal. We can just describe API and all the types that it accepts and returns and let gRPC to dispatch them to appropriate handlers.

Pinging @phpnode as he might be interested in this change and @tglman as it seems that he is responsible for network stuff.

Waiting to hear what do you think about it 😃

Issue Analytics

State:
Created 8 years ago
Comments:25 (13 by maintainers)

Top GitHub Comments

1reaction

dennwccommented, Sep 27, 2016

@tglman Thanks for your response, I thought the idea of this proposal was abandoned long ago.

I want to add a few things regarding cons:

Binary protocol would be maintained in it’s current state for 3.x in any case, thus it should not be a problem. Especially if it will be in frozen state.
As one of driver developers, I really can’t wait to see old binary protocol deprecated, so I can mark the old driver deprecated as well and write a new clean implementation at last 😃 Also, since old binary protocol will be maintained for some time, this, again, should not be a problem for existing drivers - they can still use an old one for a long time. Further, code for gRPC can be generated with one simple command, which will handle all the connection-related stuff, so the “rewrite” will only touch a connection layer (assuming document serialization is the same).
The original proposal was intended to replace whole serialization format to protobufs, but as I can see this will not happen anytime soon. A new proposal would be: replace the connection layer and commands handling with protobufs and leave document serialization as-is. This might be a good first step toward clean API surface. Maybe after some time protobufs serialization would also be contributed by someone 😃

Regarding alternative solution of stabilizing protocol and docs, this would be a one-time fix and then we might expect some other incompatible changes. Protobufs were designed to be backward-compatible, thus it would be easier for developers to track changes to protocol and update drivers accordingly. This will not be that easy with custom protocol. Things will break, devs will feel a lack of docs and OrientDB devs will have to update this docs, as a separate maintenance cost. Does it worth it? You may have a one command for all this.

Raw HTTP2 will not going to be a 1-to-1 replacement. gRPC declares a way to send streams from both client and server, thus live queries and results streams will benefit from this. Also it declares a way to handle authentication, connections, retrials and so on. With a proposed H1 -> H2 upgrade this will turn out to be just another custom protocol layer on top of H2 with lack of docs 😉

Worth to mention, gRPC supports a way to generate REST-compatible APIs (OpenAPI/Swagger) from service definitions. Thus it can one protocol (gRPC) and a REST reflection of it.

1reaction

tglmancommented, Sep 27, 2016

hi @smolinari,

From my experience there is not just the cost of the writing the code of the messages, that is usually the 30-40% of the bootstrap cost and 10-20% of maintance cost, there is as well a cost to provide a meaningful api for the environment (see nodejs callbacks programming style) and hide implementation details, like each record has a version that you should not change or you need to keep a sessionId and a token for each session ecc, sometime we do expose this details but is usually troubling for the evolution+compatibility of the protocol.

For the specific case of protobuff we did a POC for replace the record serialization with protobuf, protobuff was 5% faster on cpu using a custom implementation and not generated code (not sure is a good practice) and it was using ~30% of more space.

Bye

Top Results From Across the Web

gRPC: Main Concepts, Pros and Cons, Use Cases | AltexSoft

gRPC is a framework for implementing RPC APIs via HTTP/2. ... It uses procedure calls to request a service from a remote server...

What is gRPC? Meaning, Architecture, Advantages ⚔️

gRPC is one of the latest developer approaches to API design that promises to solve problems that other design styles have failed to...

1. Introduction to gRPC - gRPC: Up and Running [Book]

As the wire transport protocol, gRPC uses HTTP/2, which is a high-performance binary message protocol with support for bidirectional messaging. We will further ......

Packaging Generated Code for gRPC Services - Bugsnag

Not only does gRPC use the blazingly fast HTTP/2 binary protocol, but it also makes use of Google's Protocol Buffers; a major reason...

An architect's guide to APIs: SOAP, REST, GraphQL, and gRPC

Finally, there is the challenge that goes with using the Protocol Buffers binary data format. The benefit of Protocol Buffers is that it...