question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal for a new Protobuf spec

See original GitHub issue

In response to #504 I’d like to start discussion on a new spec for Protobuf (I’m not sure this is the right place but it seems as good as any)

Here’s my initial pass for comment

syntax = "proto3";

package io.cloudevents.v1;

import "google/protobuf/any.proto";
import "google/protobuf/struct.proto";
import "google/protobuf/timestamp.proto";

message CloudEvent{
  // Required Fields
  string id = 1;
  string source = 2;
  string spec_version = 3 [json_name="specversion"];
  string type = 4;

  // Optional Fields
  string data_content_type = 5 [json_name="datacontenttype"];
  string data_schema = 6 [json_name="dataschema"];
  string subject = 7;
  google.protobuf.Timestamp time = 8;
  google.protobuf.Value data = 9;
  bytes data_binary = 10 [json_name="data_base64"];
  google.protobuf.Any data_proto = 11 [json_name="iocloudeventsprotodata"];
  google.protobuf.Struct extention_context = 12
    [json_name="iocloudeventsprotoextentioncontext"];
}

Pros

  • Serializes strictly to the JSON spec using default protobuf tooling
  • Uses google.protobuf well-known types to store anything you could in a JSON event
  • Strictly defines types for all required and optional attributes not data / extensions
  • Handles time and base64 format conversion

Cons

  • Silently drops extension context attributes of a JSON event when serialized into a protobuf using standard tooling
  • Is somewhat clumsy / unidiomatic in most languages to add and extract data attributes as they don’t have specific or consistent types

If this is generally a good direction I’m happy to turn this into a PR for further comment.

Example

Marshaling to and from a JSON event in Go yields:

Input:

{
    "id" : "C234-1234-1234",
    "source" : "/mycontext",
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "datacontenttype" : "application/json",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "data" : {
        "appinfoA" : "abc",
        "appinfoB" : 123,
        "appinfoC" : true
    }
}

Output:

{
    "id": "C234-1234-1234",
    "source": "/mycontext",
    "specversion": "1.0",
    "type": "com.example.someevent",
    "datacontenttype": "application/json",
    "time": "2018-04-05T17:31:00Z",
    "data": {
        "appinfoA": "abc",
        "appinfoB": 123,
        "appinfoC": true
    }
}

Go %+v output:

id:"C234-1234-1234" source:"/mycontext" spec_version:"1.0" type:"com.example.someevent" data_content_type:"application/json" time:<seconds:1522949460 > data:<struct_value:<fields:<key:"appinfoA" value:<string_value:"abc" > > fields:<key:"appinfoB" value:<number_value:123 > > fields:<key:"appinfoC" value:<bool_value:true > > > >

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:9
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

6reactions
evankandersoncommented, Apr 10, 2020

… getting back to this from a conversation somewhere else, I’d echo a few points of @dan-j 👍

  1. I’d consider the automatic JSON -> Proto conversion to be an attractive nuisance here, and instead focus on the core data model of CloudEvents:
    • An unbounded collection of typed attributes, with a few well-defined ones.
    • A data payload which is a collection of bytes.
  2. Once a representation for a single CloudEvent is defined for protocol buffers, something like your service definition would be a reasonable one, though you may also want to define unary send and bidirectional request/response and streaming RPCs.

Note that proto tag mapping makes it difficult to handle the attributes, as there is no authority or central registry to assign tag numbers, and an implementation attempting to transcribe a received message with an unknown tag number to some other format will be SOL because proto does not encode the schema alongside the message.

On the bright side, while JSON format has difficulties representing a raw bytestream, proto can encode the payload as a byte field easily.

My suggestion would be something like the following:

message Value {
  oneof value {
    bool bool_value = 1;
    int64 int_value = 2;
    string string_value = 3;  // Use this for URI and URI-reference
    byte byte_value = 4;
    google.protobuf.Timestamp time_value = 5;
  }
}
message CloudEvent {
  // These attributes with tags 1-4 are required by the spec
  string id = 1;
  string type = 2;
  string source = 3;
  string spec_version = 4;

  // All other attributes should be encoded in "attributes", whether
  // they are defined in the core spec or an extension.
  map<string, Value> attributes = 5;

  // The event payload data. Note that it is valid to store an encoded protocol buffer
  // in the bytes data if desired.
  bytes data = 6;
}

My reasoning for encoding all non-required attributes in the map is as follows:

The main challenge in the protocol encoding is handling the upgrade case where one speaker promotes a field from the attributes map to a top-level tag, since proto binary encoding does not serialize much tag metadata (only the type which is needed for length calculations). In this scheme, I’ve opted to direct all the optional attributes into the map to try to prevent users from stumbling into the field upgrade issue.

0reactions
cbraynorcommented, Jun 4, 2020

I took a quick look earlier in the week and you’re right that it looks like the recommendation above. I think the choice of oneof is fiddly and not particularly intuitive at lease in Go where I’m most familiar, and I’m not super happy with the ongoing maintainability (i.e. it needs updating if there’s a new type - e.g. int64) but I’m happy if other people agree and we can get some consensus.

To be fair using google.protobuf.Struct and google.protobuf.Value is awkward too, but it does allow arbitrary nesting of data within the data field in a type-safe way without having to define your own proto - e.g. for transcoding JSON events.

I’ll miss the ~80% compatibility using regular tooling to transcode to the JSON spec and back, but I will concede that if it’s not 100% then it will cause issues somewhere.

Implementation beats oration - closing out this bug in favour of your PR, thank you

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Proposal] Use Protocol Buffers as spec definition format ...
Current Issue: The current swagger code generation process is unruly, generated code is checked in. Each implementation of the spec needs to ...
Read more >
Protocol Buffers Version 3 Language Specification
This is a language specification reference for version 3 of the Protocol Buffers language (proto3). The syntax is specified using Extended Backus-Naur Form ......
Read more >
Adding new fields to GTFS Realtime
If the advocate continues the work on proposal then a new vote can be called ... We chose Protocol Buffers as the basis...
Read more >
Expose Thanos APIs to OpenAPI/protobuf and ... - thanos.io
Program Proposal: https://github.com/thanos-io/thanos/issues/4102 ... Also, protobuf specification could be used to define REST API. We hope to use protobuf ...
Read more >
Encoding - Cosmos SDK Documentation
The Cosmos SDK utilizes two binary wire encoding protocols, Amino which is an object encoding specification and Protocol Buffers, a subset of Proto3...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found