Parsing exception when text contains unicode null
See original GitHub issueWhy is exception thrown in this case? I could not find any evidence that nulls are illegal. Using other Unicode control characters is fine (bell character for example). The exception is thrown if Unicode null is a numeric character reference or if its simply a literal.
var options = new HtmlParserOptions();
options.IsStrictMode = true;
options.IsEmbedded = true;
var parser = new HtmlParser(options);
// throws HtmlParseException or CharacterReferenceInvalidCode
parser.Parse(@"<html><body>�</body></html>");
// throws HtmlParseException or HtmlParseError.Null
parser.Parse("<html><body>\0</body></html>");
// does not throw
parser.Parse("<html><body>\a</body></html>");
Issue Analytics
- State:
- Created 7 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
When parsing a text file with a Scanner why am I getting ...
1. The file is encoded using a multi-byte character set. · check the character encoding of the file. it's possible the file is...
Read more >Using the null character as a delimiter in a json string
For example, trying to decode the following JSON text leads to a parse error: ["this string contains the null character: \u0000"] All other ......
Read more >Article: XML error: Invalid null character in text to output
Issue Process returns the error when attempting to parse an XML formatted document: First document failure: Unable to store data, ...
Read more >Unicode Objects and Codecs
Return a bytes object. unicode cannot contain embedded null characters. Use PyUnicode_EncodeFSDefault() to encode a string to Py_FileSystemDefaultEncoding (the ...
Read more >Parsing arguments and building values
The Python string must not contain embedded null code points; if it does, a ValueError exception is raised. Unicode objects are converted to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you, you were very helpful and made my day. Have a good one!
Most of the spec was driven by historic decisions, e.g., how browser vendors in the past (mostly Netscape) decided to handle such undefined behavior.
Since browsers started diverting a common spec needed to be designed. The majority usually won (i.e., one behavior was more common than others and thus became part of the HTML5 specification).