PrettyMarkupFormatter loses NoBreakSpaces/ at the beginning or end of the content
See original GitHub issueBug Report
AngleSharp 0.14.0
Description
Tidying an HTML document removes NoBreakSpaces ( nbsp ) characters from the output if they are located at the beginning or the end of a line (or even unique)
Steps to Reproduce
Given the following example :
static void Main(string[] args)
{
var documentStr = "<html><head></head><body><p> </p></body></html>";
var parser = new HtmlParser();
var document = parser.ParseDocument(documentStr);
var sw = new StringWriter();
document.ToHtml(sw, new PrettyMarkupFormatter());
Console.WriteLine(sw.ToString());
Console.ReadLine();
}
Expected behavior:
<html>
<head></head>
<body>
<p> </p>
</body>
</html>
Actual behavior:
<html>
<head></head>
<body>
<p></p>
</body>
</html>
Environment details: Windows 10, .Net Framework 4.6.1
Possible Solution
To me (i’m no expert), in PrettyMarkupFormatter.Text, you’re using TrimEnd() and TrimStart(), which will remove any space characters, inclusing 0xa0 (representing ).
I think they should be kept, as removing those can break the display (
andare different in HTML/CSS and browsers render them differently if they are empty).
This should at least be configurable through an option somewhere (it actually broke the display of my HTML document by removing those).
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
How to remove the   in beginning and end of the html?
I am having an HTML content with I want to remove from the beginning and end of the content. No need...
Read more >  and HTML Space Challenges and Tricks | ...
or non-breaking space, is an HTML character used to create a space. We analyze when to use it and when to opt for...
Read more >How to Add a Non-breaking Space with the  
So, in this article, I will show you how to create any number of blank spaces you want in your code, and how...
Read more >Sentence Spacing in HTML and CSS
HTML collapses repeated spaces, because within the content portion of HTML (the part that isn't tags) the only purpose a space has is...
Read more >Non-breaking space
In word processing and digital typesetting, a nonbreaking space ( ), also called NBSP, required space, hard space, or fixed space is a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Alright !
As for potential options, i’d personally include one as “DontEmptyTags” or something along those lines.
Going further in discussing whether it is a good idea or not is definitely out of my league, so if that’s the expected behaviour, i’d still describe it somewhere, and close this issue.
Thanks !
@Tetedeiench I had the same issue, this is what I did for a workaround. Essentially, there are two ways a Non-Breaking Space can be represented, one is an ASCII character that represents it but looks like a normal space, the other is the way you described.
@FlorianRappl I think an option like this can be added to the formatter to handle the scenario @Tetedeiench describe and you can use the code below, or something similar to accomplish it.
Three years late, but I’m sure this will help others who have the same issue.