question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

handling of datacontenttype is inconsistent

See original GitHub issue

CloudEvents 1.0

Consider this example, straight from the spec:

{
    ...
    "datacontenttype" : "text/xml",
    "data" : "<much wow=\"xml\"/>"
}

Clearly, data is some structure that has been encoded using the XML format and put into the event as a string (binary). Naturally, I’d assume the same behavior for JSON encoding:

{
    ...
    "datacontenttype" : "application/json",
    "data" : "{\"foo\": \"bar\"}"
}

However, that’s doesn’t seem to be the case; as the example in the HTTP protocol binding spec shows, the JSON object is not sent in its encoded form but rather nested into the event directly:

{
    ...
    "datacontenttype" : "application/json",
    "data" : {
        "foo": "bar
    }
}

Note that removing the optional datacontenttype attribute doesn’t change this, as the spec clearly states:

A JSON-format event with no datacontenttype is exactly equivalent to one with datacontenttype=“application/json”.

To sum it up, it is not possible to put a JSON-encoded data blob into a CloudEvent; and a parser needs to treat application/json different than any other datacontenttype.

HTTP Protocol Binding 1.0

For structured content mode, the spec says:

The chosen event format defines how all attributes, and data, are represented.

Does this mean that datacontenttype must be present and set to the event format? Or does structured mode implicitly change the default of datacontenttype from application/json to whatever event format is in use? What if datacontenttype is present and set to a different encoding - must a parser treat this event as malformed?

JSON Event Format 1.0

As a side note, the JSON Format spec makes this even more confusing:

If the implementation determines that the type of data is Binary, the value MUST be represented as a JSON string expression containing the Base64 encoded binary value, and use the member name data_base64 to store it inside the JSON object.

This basically says that you have to Base64-encode any simple JSON string (which is, of course, binary). Also, if a receiver does not implement the optional (!) JSON Format spec, it won’t be able to parse the data_base64 value; consequently, implementing the JSON Format spec as a sender means not implementing the full CloudEvents spec.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:52 (27 by maintainers)

github_iconTop GitHub Comments

2reactions
krispennercommented, Nov 6, 2021

Regarding these two comments about data_json, data_text and data_base64: https://github.com/cloudevents/spec/issues/558#issuecomment-873890684 https://github.com/cloudevents/spec/issues/558#issuecomment-876218637

I would recommend to handle it the same way it is done for protobuf. Only one of the data_* fields would be allowed. This way it is up to the producer to determine, if something is text, binary or JSON without any hidden contract or ambiguity for the consumer or intermediary. Unfortunately I don’t see, how this could be introduced as a non-breaking change. So we still need a solution for specversion: 1.0. This issue is on the agenda for today’s CloudEvents call. Let’s see, if someone comes up with a clever proposal.

I realize this is about solving the issue in spec version 1.0 and not be a breaking change, but going beyond that, is there any discussion for the next version anywhere that would allow for breaking changes like this?

I’d prefer to see a dataencoding attribute “re”-added with a value of either json, text or base64 and then only a single data attribute to hold the payload. I’m not seeing the benefit of instead defining individual attributes as mentioned of data_json, data_text or data_base64. It sounds like a dataencoding attribute was once part of the spec but dropped, maybe it needs to be re-introduced. This would remove any “special” case of */json or */*+json for the datacontenttype attribute and simplify the whole confusion here. Or maybe I’m missing why it wouldn’t.

I also question the attribute naming formats for consistency. The other attributes are all lowercase, not camel nor snake, so why is data_base64 all of a sudden using snake case? For consistency is should be database64. But to avoid this inconsistency altogether and to avoid adding any more data_xxx fields later, I propose just use data only and add dataencoding to specify the encoding format.

This issue is still open, so I thought I would add my suggestion. I’m a bit confused by the merges as to whether this is considered fixed for spec 1.0 or not now, but I’m suggesting how I think it could be simplified for a future version anyways.

Examples

JSON as JSON

If dataencoding is json, then only datacontenttype of */json or */*+json is allowed.

"dataencoding": "json",
"datacontenttype": "application/json",
"data": {
    value: 1
}

To read this would be: var value = event.data.value;

JSON as text

"dataencoding": "text",
"datacontenttype": "application/json",
"data": "{ \"value\": 1 }"

To read this would be: var value = parseJson(event.data).value;

XML as text

"dataencoding": "text",
"datacontenttype": "application/xml",
"data": "<much wow=\"xml\"/>"

To read this would be: var wow = parseXml(event.data).attr("wow");

JSON as bytes

"dataencoding": "base64",
"datacontenttype": "application/json",
"data": "ew0KICAgIHZhbHVlOiAxDQp9"

To read this would be: var value = parseJson(toUtf8String(fromBase64(event.data))).value;

XML as bytes

"dataencoding": "base64",
"datacontenttype": "application/xml",
"data": "PG11Y2ggd293PSJ4bWwiLz4="

To read this would be: var wow = parseXml(toUtf8String(fromBase64(event.data))).attr("wow");

Binary as bytes

"dataencoding": "base64",
"datacontenttype": "image/png",
"data": "c29tZWltYWdlZGF0YQ=="

To read this would be: var imageBytes = fromBase64(event.data);

Thank you.

2reactions
dazumacommented, Jun 24, 2021

@duglin @deissnerk This is coming up in my work on the Ruby SDK, and I want to bring up a clarification question.

To summarize a conclusion from above:

In the following CE:

const ce1 = new CloudEvent({
  specversion: "1.0",
  id: "C234-1234-1234",
  source: "/mycontext",
  type: "com.example.someevent",
  datacontenttype: "application/json",
  data: "{\"foo\": \"bar\"}"
});

… it sounds like the data should be considered a JSON value of type string. The fact that the string’s value happens to look like serialized JSON is irrelevant. It is simply a string. Therefore, if we were to serialize this CE in HTTP Binary mode, it might look like this:

CE-SpecVersion 1.0
CE-Type: com.example.someevent
CE-Source: /mycontext
CE-ID: C234-1234-1234
Content-Type: application/json

"{\"foo\" : \"bar\"}"

The data must be “escaped” in this way, so that a receiver parsing this content with the application/json content type will end up with a JSON string and not an object.

As a corollary, when deserializing an HTTP Binary mode CE with Content-Type: application/json, the HTTP protocol handler must parse the JSON and set the data attribute in memory to the actual JSON value (rather than the string representation of the JSON document). Otherwise, the content’s semantics will change when the CE gets re-serialized. And this, of course, all implies that an SDK’s HTTP protocol handler (and perhaps other protocol handlers as well) must understand JSON, even if the JSON structured format is not in use.

Taking that as given, consider this implication:

Earlier a comparison was made with application/xml, noting a possible inconsistency. Consider this parallel example:

const ce2 = new CloudEvent({
  specversion: "1.0",
  id: "C234-1234-1234",
  source: "/mycontext",
  type: "com.example.someevent",
  datacontenttype: "application/xml",
  data: "<much wow=\"xml\"/>"
});

If we were to treat this XML data consistently with how we treated the earlier JSON data, we would consider this data as a string node in an XML document, whose contents just happen to look like XML. Hence, serializing this as HTML-Binary might yield something like:

CE-SpecVersion 1.0
CE-Type: com.example.someevent
CE-Source: /mycontext
CE-ID: C234-1234-1234
Content-Type: application/xml

&lt;much wow="xml"/&gt;

However, my understanding of the spec, and my understanding of the current behavior of the SDKs, suggests we are not doing that. (And indeed I’m glad, because that would, in turn, imply that all protocol handlers would also need to understand XML.) Instead, we actually consider the above data as semantically an XML document and not a string. Hence, serializing this as HTML-Binary actually looks like:

CE-SpecVersion 1.0
CE-Type: com.example.someevent
CE-Source: /mycontext
CE-ID: C234-1234-1234
Content-Type: application/xml

<much wow="xml"/>

In other words, our handling of the XML content-type appears to be inconsistent with our handling of the JSON content-type.

So my clarification question is:

  1. Am I correct in my interpretation that the spec intentionally treats data with content-type application/json specially, differently from string data with content-type application/xml (or indeed any other content-type), as illustrated above?

If so, follow-up questions:

  1. Is the reason for this that we (for some reason) consider JSON uniquely special among all content types in the universe, or is the reason simply that the spec currently happens to include a JSON format but not an XML format to define how data with that datacontenttype is rendered? Suppose a future spec version adds an XML format, YAML format, Protobuf format, etc. Would we at that time need to change the behavior of those formats to be like JSON (which would be a breaking change)?
  2. How do we precisely identify which content types are to be treated in this special way? For example, application/json is obvious, but what if the datacontenttype is itself application/cloudevents+json (i.e. a cloudevent whose payload is another cloudevent)? If we do consider JSON special, it seems it might be a good idea for the spec to state that explicitly, and define how it is identified, perhaps with reference to fields in RFC 2046 or similar.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Interface CloudEventV1<T> - CloudEvents
The rules for how data content is rendered for different datacontenttype values are defined in the event format specifications; for example, the JSON...
Read more >
cncf-cloudevents@lists.cncf.io | CloudEvents: Homework from call
Review this issue on JSON encoding: handling of datacontenttype is inconsistent - Review Jon's proposal for reorganizing the repo/versioning: actual ...
Read more >
CloudEvents - JSON event format | Eventarc - Google Cloud
datacontenttype, The type of data that has been passed, application/json. id, The unique identifier for the event, 2070443601311540.
Read more >
CloudEvents |
Consistency. The lack of a common way of describing events means developers have to write new event handling logic for each event source....
Read more >
Cloud Events - Object Partners
datacontenttype – describes the format and encoding of the event data ... Knative both use CEL extensively for their event handling systems.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found