Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

find_json() may output malformed JSON due to naive splitlines() usage

See original GitHub issue

The find_json function in salt.utils.json is designed to extract valid JSON from a raw string which may begin with invalid JSON (e.g. due to a plain text header preceding the JSON data). It does this by calling Python’s splitlines() on the raw string and attempting to load the resulting data as JSON by incrementally removing the beginning lines until it succeeds or it runs out of lines.

Unfortunately the call to splitlines() is naive here as it will split on characters which are valid to embed in JSON but will result in malformed JSON if converted to newlines. A complete listing is container in the stdlib documentation. Particularly problematic are the splitting on U+2028 (Line Separator) and U+2029 (Paragraph Separator).

We recently encountered this issue on a production system where the output from npm audit as part of an npm.install call via the npm.bootstrap state included several U+2028 characters in one of the JSON values. Salt split these characters into newlines resulting in malformed JSON, and the state would be marked as failed due to the missing changes dictionary in the returned output.

I’m not clear what the best way to solve this is, but expect that converting those Unicode characters to newlines is probably not the correct behaviour in any scenario we’d expect to encounter?