Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Html Parser Strips CRLF and replaces with LF

See original GitHub issue

This seems to be by design, but I’m wondering as to the intention here. I reviewed #149 and see why this should apply to XML, but perhaps it shouldn’t to HTML.

I was working on a project that uses Premailer.net (which in turn uses AngleSharp) to inline some CSS for emails. Every so often, we would receive errors regarding Maximum Line Length (see RFC 2822 2.1.1 https://www.ietf.org/rfc/rfc2822.txt) for some of these emails (a few email servers were rejecting the emails we were sending out) and I ended up investigating this in depth.

Using Premailer, we’d generate the HTML as a string and use a different third party library to then send these out to interested parties. As inputs to Premailer, we would send out emails with CRLF Example: <html><head>etc</head>\r\n<body>yaddayadda</body></html>\r\n. After being processed by Premailer, all newlines would be replaced with \n.

Yet this presents a problem as technically CRLF is end-of-line marker per RFC 2616 (https://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html). It seems like most are lax in following this rule, where others follow it more strictly.

After investigating, we found the cause to be a result of calling NormalizeForward in BaseTokenizer in AngleSharp, which normalizes all forms of newline to LF.

While I’m not 100% confident in my analysis, I figured reaching out wouldn’t hurt. It seems like one of the email clients we are using will replace LF with CRLF anyway, but for another, we have had sporadic delivery issues.

Issue Analytics

State:
Created 5 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

blumenshinemcommented, Nov 10, 2018

Maybe I am misunderstanding something fundamental however. AngleSharp is returning HTML, as it should. Protocols like HTTP/SMTP are perhaps not concerns of this service. Perhaps then writing a new formatter would be best.

Again, thanks for your consideration

0reactions

FlorianRapplcommented, Nov 11, 2018

Cool thanks for following up on this! Never knew about the 998 char limit - thanks!