An immediately closed HTML tag does not parse properly
See original GitHub issueI’m trying to parse some user’s input HTML which happens to have an empty <div></div>
tag.
When parsing into XML, I get this: <div/>
And then when parsing back into HTML, the <div>
is then a wrapper for the inner content.
Ex:
<div></div>
<p>hello world</p>
Parsing into XML
const xml = $.load('<div></div><p>hello world</p>', {xmlMode: true});
Parsing back into HTML:
const newHtml = xml.html({decodeEntities: true});
Gives:
<div>
<p>hello world</p>
</div>
This is an issue because of styling that might happen on the containing div, etc.
I’m not exactly sure if this is a bug or intentional. I know that <div></div>
doesn’t make much sense, but it’s something I’m dealt with. To clarify; if the div has a space within it, the HTML is parsed as-expected.
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (2 by maintainers)
Top Results From Across the Web
How to correctly fix unclosed HTML tags with Nokogiri
First, your HTML is invalid because (besides missing closing tags) there's no <ul> or <ol> tag, so Nokogiri switches to guessing right away, ......
Read more >HTML highlight parser error (like unclosed tags) #55479
When writing some simple HTML, I would like to see at a glance whether my html is correct or not. Webstorm use to...
Read more >Self-Closing Tags in HTML 5 [Guide]
Wondering whether or not to close html tags? In this Treehouse blog we cover how to use self-closing tags in HTML.
Read more >Parsing tags which are not closed from web page with ...
I am parsing and modifying HTML content using HtmlAgilityPack. The DocumentNode.OuterHtml seems to provide the needed closing tags, however....I ...
Read more >Google homepage doesn't close html tags, on purpose
It seems to me that logic dictates it would take longer to parse a broken schema than a valid one. Perhaps leaving out...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
fyi unicron’s repro above was obtained from the options
This outputs xhtml
content.html() === '<p><strong/>Hello <br/>world</p>'
Our difficulty with this is that we are outputting our document as html. Browsers cope with HTML containing
<br/>
but don’t cope with<strong/>
, so we need to fix our inconsistencies. On further reflection I don’t think the issue we have experienced is with cheerio.Fixed by #985