handling of datacontenttype is inconsistent
See original GitHub issueCloudEvents 1.0
Consider this example, straight from the spec:
{
...
"datacontenttype" : "text/xml",
"data" : "<much wow=\"xml\"/>"
}
Clearly, data
is some structure that has been encoded using the XML format and put into the event as a string (binary). Naturally, I’d assume the same behavior for JSON encoding:
{
...
"datacontenttype" : "application/json",
"data" : "{\"foo\": \"bar\"}"
}
However, that’s doesn’t seem to be the case; as the example in the HTTP protocol binding spec shows, the JSON object is not sent in its encoded form but rather nested into the event directly:
{
...
"datacontenttype" : "application/json",
"data" : {
"foo": "bar
}
}
Note that removing the optional datacontenttype
attribute doesn’t change this, as the spec clearly states:
A JSON-format event with no datacontenttype is exactly equivalent to one with datacontenttype=“application/json”.
To sum it up, it is not possible to put a JSON-encoded data blob into a CloudEvent; and a parser needs to treat application/json
different than any other datacontenttype
.
HTTP Protocol Binding 1.0
For structured content mode, the spec says:
The chosen event format defines how all attributes, and data, are represented.
Does this mean that datacontenttype
must be present and set to the event format? Or does structured mode implicitly change the default of datacontenttype
from application/json
to whatever event format is in use? What if datacontenttype
is present and set to a different encoding - must a parser treat this event as malformed?
JSON Event Format 1.0
As a side note, the JSON Format spec makes this even more confusing:
If the implementation determines that the type of data is Binary, the value MUST be represented as a JSON string expression containing the Base64 encoded binary value, and use the member name data_base64 to store it inside the JSON object.
This basically says that you have to Base64-encode any simple JSON string (which is, of course, binary). Also, if a receiver does not implement the optional (!) JSON Format spec, it won’t be able to parse the data_base64
value; consequently, implementing the JSON Format spec as a sender means not implementing the full CloudEvents spec.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:52 (27 by maintainers)
Top GitHub Comments
Regarding these two comments about
data_json
,data_text
anddata_base64
: https://github.com/cloudevents/spec/issues/558#issuecomment-873890684 https://github.com/cloudevents/spec/issues/558#issuecomment-876218637I realize this is about solving the issue in spec version 1.0 and not be a breaking change, but going beyond that, is there any discussion for the next version anywhere that would allow for breaking changes like this?
I’d prefer to see a
dataencoding
attribute “re”-added with a value of eitherjson
,text
orbase64
and then only a singledata
attribute to hold the payload. I’m not seeing the benefit of instead defining individual attributes as mentioned ofdata_json
,data_text
ordata_base64
. It sounds like adataencoding
attribute was once part of the spec but dropped, maybe it needs to be re-introduced. This would remove any “special” case of*/json
or*/*+json
for thedatacontenttype
attribute and simplify the whole confusion here. Or maybe I’m missing why it wouldn’t.I also question the attribute naming formats for consistency. The other attributes are all lowercase, not camel nor snake, so why is
data_base64
all of a sudden using snake case? For consistency is should bedatabase64
. But to avoid this inconsistency altogether and to avoid adding any moredata_xxx
fields later, I propose just usedata
only and adddataencoding
to specify the encoding format.This issue is still open, so I thought I would add my suggestion. I’m a bit confused by the merges as to whether this is considered fixed for spec 1.0 or not now, but I’m suggesting how I think it could be simplified for a future version anyways.
Examples
JSON as JSON
If
dataencoding
isjson
, then onlydatacontenttype
of*/json
or*/*+json
is allowed.To read this would be:
var value = event.data.value;
JSON as text
To read this would be:
var value = parseJson(event.data).value;
XML as text
To read this would be:
var wow = parseXml(event.data).attr("wow");
JSON as bytes
To read this would be:
var value = parseJson(toUtf8String(fromBase64(event.data))).value;
XML as bytes
To read this would be:
var wow = parseXml(toUtf8String(fromBase64(event.data))).attr("wow");
Binary as bytes
To read this would be:
var imageBytes = fromBase64(event.data);
Thank you.
@duglin @deissnerk This is coming up in my work on the Ruby SDK, and I want to bring up a clarification question.
To summarize a conclusion from above:
In the following CE:
… it sounds like the data should be considered a JSON value of type string. The fact that the string’s value happens to look like serialized JSON is irrelevant. It is simply a string. Therefore, if we were to serialize this CE in HTTP Binary mode, it might look like this:
The data must be “escaped” in this way, so that a receiver parsing this content with the
application/json
content type will end up with a JSON string and not an object.As a corollary, when deserializing an HTTP Binary mode CE with
Content-Type: application/json
, the HTTP protocol handler must parse the JSON and set the data attribute in memory to the actual JSON value (rather than the string representation of the JSON document). Otherwise, the content’s semantics will change when the CE gets re-serialized. And this, of course, all implies that an SDK’s HTTP protocol handler (and perhaps other protocol handlers as well) must understand JSON, even if the JSON structured format is not in use.Taking that as given, consider this implication:
Earlier a comparison was made with
application/xml
, noting a possible inconsistency. Consider this parallel example:If we were to treat this XML data consistently with how we treated the earlier JSON data, we would consider this data as a
string node
in an XML document, whose contents just happen to look like XML. Hence, serializing this as HTML-Binary might yield something like:However, my understanding of the spec, and my understanding of the current behavior of the SDKs, suggests we are not doing that. (And indeed I’m glad, because that would, in turn, imply that all protocol handlers would also need to understand XML.) Instead, we actually consider the above data as semantically an XML document and not a string. Hence, serializing this as HTML-Binary actually looks like:
In other words, our handling of the XML content-type appears to be inconsistent with our handling of the JSON content-type.
So my clarification question is:
application/json
specially, differently from string data with content-typeapplication/xml
(or indeed any other content-type), as illustrated above?If so, follow-up questions:
application/json
is obvious, but what if the datacontenttype is itselfapplication/cloudevents+json
(i.e. a cloudevent whose payload is another cloudevent)? If we do consider JSON special, it seems it might be a good idea for the spec to state that explicitly, and define how it is identified, perhaps with reference to fields in RFC 2046 or similar.