question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Canonical JSON is unclear

See original GitHub issue

We’re having an internal debate and we’re not sure we’re using CJSON in the correct way. Let’s say we want to calculate a signature over this:

{
   "quux": 123,
   "foo": "bar\nbaz"
}

So what we do is:

priv = get_my_key()
cjson = encode_canonical(load_json())
sig = sign(priv, cjson)

Should the output of the function be A or B?

A:
{"foo":"bar
baz","quux":123}

B:
{"foo":"bar\nbaz","quux":123}

My claim is A is correct, but others claim B is correct. Though in some sense it doesn’t really matter which is used so long as all clients use the same (assuming both are deterministic).

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:27 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
vladimir-v-diazcommented, Jun 12, 2017

Yes, both json.dumps() and the canonicaljson package appear to escape \n in strings, whereas CJSON preserves characters, with the exception of the single quote and backslash characters.

>>> json_loaded = {'message': 'Hello\nworld'}
>>> json.dumps(json_loaded)
#'{"message": "Hello\\nworld"}'

>>> from canonicaljson import encode_canonical_json as ecj
>>> ecj(json_loaded)
#'{"message":"Hello\\nworld"}'

>>> from securesystemslib.formats import encode_canonical as cjson
>>> cjson(json_loaded)
#u'{"message":"Hello\nworld"}'

Since TUF metadata is not written in canonical JSON nor needs to load CJSON with json.dumps() or json.loads(), this is not an issue in the reference implementation. We only use CJSON to calculate digests and signatures.

The canonicaljson module unfortunately doesn’t follow all of the restrictions outlined in the OLPC Wiki.

1reaction
heartsuckercommented, Jun 10, 2017

The PyPI package canonical_json is wrong. The code in securesystemslib is (I haven’t fully audited it) correct.

Here is a Rust implementation: https://vtllf.org/rustdoc/canonical_json/src/canonical_json/src/ser.rs.html#1-828

const QU: u8 = b'"';  // \x22
const BS: u8 = b'\\'; // \x5C

// Lookup table of escape sequences. A value of b'x' at index i means that byte
// i is escaped as "\x" in JSON. A value of 0 means that byte i is not escaped.
#[cfg_attr(rustfmt, rustfmt_skip)]
static ESCAPE: [u8; 256] = [
    //  1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 0
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 1
    0,  0, QU,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 2
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 3
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 4
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, BS,  0,  0,  0, // 5
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 6
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 7
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 8
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // 9
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // A
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // B
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // C
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // D
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // E
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, // F
];

There are exactly two escapes, and you are allowed to have completely arbitrary input strings.

There are two answers.

Answer 1

If we ignore the incorrect sentence in the OLPC definition and follow the grammar that is defined on the wiki, then the output is not JSON, and we continue using the OLPC definition regardless.

Answer 2

We define CJSON as:

Canonical JSON is the subset of JSON where:

  1. Floats are not allowed
  2. There is no whitespace outside of strings
  3. Keys in objects are ordered by their byte representations

There is no way to say that the output of the OLPC grammar is JSON. It is not. You are allowed to have arbitrary strings as input to the encode function and (among other things) newlines are not escaped in the output. JSON escapes unicode as \uXXX, but CJSON says "any byte except \ and " which actually means that the output of CJSON doesn’t even have to be unicode at all (which itself makes for a really terrible spec since the assumption is that it has strings as input).

If we pick Answer 2 (which is the most sane) then all references to the OLPC wiki need to be removed and the securesystemslib have it’s function changed to match. In fact, you can do this with json.dumps(my_json, sort_keys=True, separators=(':', ',')) plus a recursive function that errors on floats.

I don’t care if Answer 1 or Answer 2 becomes the spec. I just care that whatever is being used now is not fully understood by everyone and relies on (the OLPC) ambiguous grammar that is in direct contradiction to the first sentence of it’s own spec.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Canonical JSON is unclear · Issue #457 - GitHub
We define CJSON as: Canonical JSON is the subset of JSON where: ... There is no way to say that the output of...
Read more >
JSON Canonicalization Scheme (JCS) RFC 8785
This document describes the JSON Canonicalization Scheme (JCS). This specification defines how to create a canonical representation of JSON data by building ...
Read more >
Scala parses non-canonical JSON - Stack Overflow
Answer to your question with specific framework (play-json) ... _ val json: JsValue = Json.parse(jsonString) val list = (json ...
Read more >
How not to sign a JSON object - Hacker News
An easy enough mantra: sign bytes, not semantics. I've trudged through the bowels of SAML long enough to know that canonicalization is ...
Read more >
JSON.canonicalize() - ESDiscuss.org
Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found