find_json() may output malformed JSON due to naive splitlines() usage
See original GitHub issueThe find_json function in salt.utils.json
is designed to extract valid JSON from a raw string which may begin with invalid JSON (e.g. due to a plain text header preceding the JSON data). It does this by calling Python’s splitlines()
on the raw string and attempting to load the resulting data as JSON by incrementally removing the beginning lines until it succeeds or it runs out of lines.
Unfortunately the call to splitlines()
is naive here as it will split on characters which are valid to embed in JSON but will result in malformed JSON if converted to newlines. A complete listing is container in the stdlib documentation. Particularly problematic are the splitting on U+2028
(Line Separator) and U+2029
(Paragraph Separator).
We recently encountered this issue on a production system where the output from npm audit
as part of an npm.install
call via the npm.bootstrap
state included several U+2028
characters in one of the JSON values. Salt split these characters into newlines resulting in malformed JSON, and the state would be marked as failed due to the missing changes
dictionary in the returned output.
I’m not clear what the best way to solve this is, but expect that converting those Unicode characters to newlines is probably not the correct behaviour in any scenario we’d expect to encounter?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (5 by maintainers)
@sagetherage @waynew Sorry for the delayed response! Yes, my read of @waynew’s approach seems like a pretty elegant fix for the issue 👍
sorry, my mistake in closing, re-opened