Canonical JSON is unclear
See original GitHub issueWe’re having an internal debate and we’re not sure we’re using CJSON in the correct way. Let’s say we want to calculate a signature over this:
{
"quux": 123,
"foo": "bar\nbaz"
}
So what we do is:
priv = get_my_key()
cjson = encode_canonical(load_json())
sig = sign(priv, cjson)
Should the output of the function be A or B?
A:
{"foo":"bar
baz","quux":123}
B:
{"foo":"bar\nbaz","quux":123}
My claim is A is correct, but others claim B is correct. Though in some sense it doesn’t really matter which is used so long as all clients use the same (assuming both are deterministic).
Issue Analytics
- State:
- Created 6 years ago
- Comments:27 (12 by maintainers)
Top Results From Across the Web
Canonical JSON is unclear · Issue #457 - GitHub
We define CJSON as: Canonical JSON is the subset of JSON where: ... There is no way to say that the output of...
Read more >JSON Canonicalization Scheme (JCS) RFC 8785
This document describes the JSON Canonicalization Scheme (JCS). This specification defines how to create a canonical representation of JSON data by building ...
Read more >Scala parses non-canonical JSON - Stack Overflow
Answer to your question with specific framework (play-json) ... _ val json: JsValue = Json.parse(jsonString) val list = (json ...
Read more >How not to sign a JSON object - Hacker News
An easy enough mantra: sign bytes, not semantics. I've trudged through the bowels of SAML long enough to know that canonicalization is ...
Read more >JSON.canonicalize() - ESDiscuss.org
Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, both
json.dumps()
and thecanonicaljson
package appear to escape\n
in strings, whereas CJSON preserves characters, with the exception of the single quote and backslash characters.Since TUF metadata is not written in canonical JSON nor needs to load CJSON with json.dumps() or
json.loads()
, this is not an issue in the reference implementation. We only use CJSON to calculate digests and signatures.The
canonicaljson
module unfortunately doesn’t follow all of the restrictions outlined in the OLPC Wiki.The PyPI package
canonical_json
is wrong. The code insecuresystemslib
is (I haven’t fully audited it) correct.Here is a Rust implementation: https://vtllf.org/rustdoc/canonical_json/src/canonical_json/src/ser.rs.html#1-828
There are exactly two escapes, and you are allowed to have completely arbitrary input strings.
There are two answers.
Answer 1
If we ignore the incorrect sentence in the OLPC definition and follow the grammar that is defined on the wiki, then the output is not JSON, and we continue using the OLPC definition regardless.
Answer 2
We define CJSON as:
Canonical JSON is the subset of JSON where:
There is no way to say that the output of the OLPC grammar is JSON. It is not. You are allowed to have arbitrary strings as input to the encode function and (among other things) newlines are not escaped in the output. JSON escapes unicode as
\uXXX
, but CJSON says "any byte except\
and"
which actually means that the output of CJSON doesn’t even have to be unicode at all (which itself makes for a really terrible spec since the assumption is that it has strings as input).If we pick Answer 2 (which is the most sane) then all references to the OLPC wiki need to be removed and the
securesystemslib
have it’s function changed to match. In fact, you can do this withjson.dumps(my_json, sort_keys=True, separators=(':', ','))
plus a recursive function that errors on floats.I don’t care if Answer 1 or Answer 2 becomes the spec. I just care that whatever is being used now is not fully understood by everyone and relies on (the OLPC) ambiguous grammar that is in direct contradiction to the first sentence of it’s own spec.