question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parsing exception when text contains unicode null

See original GitHub issue

Why is exception thrown in this case? I could not find any evidence that nulls are illegal. Using other Unicode control characters is fine (bell character for example). The exception is thrown if Unicode null is a numeric character reference or if its simply a literal.

var options = new HtmlParserOptions();
options.IsStrictMode = true;
options.IsEmbedded = true;
var parser = new HtmlParser(options);

// throws HtmlParseException or CharacterReferenceInvalidCode
parser.Parse(@"<html><body>&#0;</body></html>");

// throws HtmlParseException or HtmlParseError.Null
parser.Parse("<html><body>\0</body></html>");

// does not throw
parser.Parse("<html><body>\a</body></html>");

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
StubbyCat2commented, Oct 6, 2016

Thank you, you were very helpful and made my day. Have a good one!

0reactions
FlorianRapplcommented, Oct 6, 2016

Most of the spec was driven by historic decisions, e.g., how browser vendors in the past (mostly Netscape) decided to handle such undefined behavior.

Since browsers started diverting a common spec needed to be designed. The majority usually won (i.e., one behavior was more common than others and thus became part of the HTML5 specification).

Read more comments on GitHub >

github_iconTop Results From Across the Web

When parsing a text file with a Scanner why am I getting ...
1. The file is encoded using a multi-byte character set. · check the character encoding of the file. it's possible the file is...
Read more >
Using the null character as a delimiter in a json string
For example, trying to decode the following JSON text leads to a parse error: ["this string contains the null character: \u0000"] All other ......
Read more >
Article: XML error: Invalid null character in text to output
Issue Process returns the error when attempting to parse an XML formatted document: First document failure: Unable to store data, ...
Read more >
Unicode Objects and Codecs
Return a bytes object. unicode cannot contain embedded null characters. Use PyUnicode_EncodeFSDefault() to encode a string to Py_FileSystemDefaultEncoding (the ...
Read more >
Parsing arguments and building values
The Python string must not contain embedded null code points; if it does, a ValueError exception is raised. Unicode objects are converted to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found