question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Html Parser Strips CRLF and replaces with LF

See original GitHub issue

This seems to be by design, but I’m wondering as to the intention here. I reviewed #149 and see why this should apply to XML, but perhaps it shouldn’t to HTML.

I was working on a project that uses Premailer.net (which in turn uses AngleSharp) to inline some CSS for emails. Every so often, we would receive errors regarding Maximum Line Length (see RFC 2822 2.1.1 https://www.ietf.org/rfc/rfc2822.txt) for some of these emails (a few email servers were rejecting the emails we were sending out) and I ended up investigating this in depth.

Using Premailer, we’d generate the HTML as a string and use a different third party library to then send these out to interested parties. As inputs to Premailer, we would send out emails with CRLF Example: <html><head>etc</head>\r\n<body>yaddayadda</body></html>\r\n. After being processed by Premailer, all newlines would be replaced with \n.

Yet this presents a problem as technically CRLF is end-of-line marker per RFC 2616 (https://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html). It seems like most are lax in following this rule, where others follow it more strictly.

After investigating, we found the cause to be a result of calling NormalizeForward in BaseTokenizer in AngleSharp, which normalizes all forms of newline to LF.

While I’m not 100% confident in my analysis, I figured reaching out wouldn’t hurt. It seems like one of the email clients we are using will replace LF with CRLF anyway, but for another, we have had sporadic delivery issues.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
blumenshinemcommented, Nov 10, 2018

Maybe I am misunderstanding something fundamental however. AngleSharp is returning HTML, as it should. Protocols like HTTP/SMTP are perhaps not concerns of this service. Perhaps then writing a new formatter would be best.

Again, thanks for your consideration

0reactions
FlorianRapplcommented, Nov 11, 2018

Cool thanks for following up on this! Never knew about the 998 char limit - thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How can I replace certain carriage return line feed followed ...
Your question says you only care about DISPLAYING them on separate lines, not specifically about replacing the \r\n.
Read more >
Removing carriage returns and HTML tags
I have the code below which replaces certain tags, and the last set of code removes all other between < > tags. This...
Read more >
Preserving Line Breaks When Using Jsoup
In this tutorial, we'll look briefly at the different ways of preserving line breaks when using Jsoup to parse HTML to plain text....
Read more >
Solved: Replace Newline in Flow Expression
Solved: I have a flow that is parsing a plain text email and creating an item in a SharePoint list. Everything works great...
Read more >
SQL Carriage Returns or Tabs in SQL Server strings
SQL Carriage Return (CR): The Carriage Return moves the cursor to the beginning of the line. It does not move to the next...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found