question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PrettyMarkupFormatter loses NoBreakSpaces/  at the beginning or end of the content

See original GitHub issue

Bug Report

AngleSharp 0.14.0

Description

Tidying an HTML document removes NoBreakSpaces ( nbsp ) characters from the output if they are located at the beginning or the end of a line (or even unique)

Steps to Reproduce

Given the following example :

        static void Main(string[] args)
        {
            var documentStr = "<html><head></head><body><p>&nbsp;</p></body></html>";
            var parser = new HtmlParser();
            
            var document = parser.ParseDocument(documentStr);
            var sw = new StringWriter();

            document.ToHtml(sw, new PrettyMarkupFormatter());

            Console.WriteLine(sw.ToString());
            Console.ReadLine();
        }

Expected behavior:

<html>
        <head></head>
        <body>
                <p>&nbsp;</p>
        </body>
</html>

Actual behavior:

<html>
        <head></head>
        <body>
                <p></p>
        </body>
</html>

Environment details: Windows 10, .Net Framework 4.6.1

Possible Solution

To me (i’m no expert), in PrettyMarkupFormatter.Text, you’re using TrimEnd() and TrimStart(), which will remove any space characters, inclusing 0xa0 (representing   ).

I think they should be kept, as removing those can break the display (

and

 

are different in HTML/CSS and browsers render them differently if they are empty).

This should at least be configurable through an option somewhere (it actually broke the display of my HTML document by removing those).

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Tetedeienchcommented, Apr 15, 2020

Alright !

As for potential options, i’d personally include one as “DontEmptyTags” or something along those lines.

Going further in discussing whether it is a good idea or not is definitely out of my league, so if that’s the expected behaviour, i’d still describe it somewhere, and close this issue.

Thanks !

1reaction
TopCoder02commented, Mar 3, 2023

@Tetedeiench I had the same issue, this is what I did for a workaround. Essentially, there are two ways a Non-Breaking Space can be represented, one is an ASCII character that represents it but looks like a normal space, the other is the way you described.

@FlorianRappl I think an option like this can be added to the formatter to handle the scenario @Tetedeiench describe and you can use the code below, or something similar to accomplish it.

Three years late, but I’m sure this will help others who have the same issue.

Private Function MakePrettyHTML(ByVal vHTMLString As String) As String
     Dim PrettyParser As New AngleSharp.Html.Parser.HtmlParser
     vHTMLString = Regex.Replace(vHTMLString, "(?<=>) (?=[ \r\n\t<\/])", "~NBS~", RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
     vHTMLString = Regex.Replace(vHTMLString, "(?<=>)&nbsp; (?=[ \r\n\t<\/])", "~NBS2~", RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
     Dim oDocument = PrettyParser.ParseDocument(vHTMLString)
     Dim sw = New StringWriter()
     oDocument.ToHtml(sw, New PrettyMarkupFormatter())
     Dim oPrettyHTML As String = sw.ToString()
     oPrettyHTML = oPrettyHTML.Replace("~NBS~", " ")
     oPrettyHTML = oPrettyHTML.Replace("~NBS2~", "&nbsp;")
 End Function
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to remove the &nbsp in beginning and end of the html?
I am having an HTML content with   I want to remove   from the beginning and end of the content. No need...
Read more >
&nbsp and HTML Space Challenges and Tricks | ...
or non-breaking space, is an HTML character used to create a space. We analyze when to use it and when to opt for...
Read more >
How to Add a Non-breaking Space with the &nbsp
So, in this article, I will show you how to create any number of blank spaces you want in your code, and how...
Read more >
Sentence Spacing in HTML and CSS
HTML collapses repeated spaces, because within the content portion of HTML (the part that isn't tags) the only purpose a space has is...
Read more >
Non-breaking space
In word processing and digital typesetting, a nonbreaking space ( ), also called NBSP, required space, hard space, or fixed space is a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found