Cheerio produces self-closing tags when element content is empty
See original GitHub issueHi Cheerio Team
I’m trying to solve the issue with self-closing tags. This code:
const cheerio = require('cheerio')
const $ = cheerio.load('<div data-message="msg"></div>',
{
xmlMode: true,
decodeEntities: false,
})
console.log($.xml())
Outputs: <div data-message="msg"/>
But I need the following output: <div data-message="msg"></div>
Also I have to keep cheerio options xmlMode: true and decodeEntities: false
Is there any way to tell cheerio to close tags in such cases?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:7
- Comments:6 (1 by maintainers)
Top Results From Across the Web
Self closing tag space issue in xml - Cheerio - Stack Overflow
I think your best option in that case is to do a commit that rationalizes the data (makes it consistently have the space...
Read more >Cheerio produces self-closing tags when element content is ...
I'm trying to solve the issue with self-closing tags. This code: const cheerio = require('cheerio') const $ = cheerio.load('<div data ...
Read more >Void element - MDN Web Docs Glossary - Mozilla
In such cases, if an element's start tag is marked as self-closing, the element must not have an end tag.
Read more >.contents() | jQuery API Documentation
This code first retrieves the contents of <div class="container"> and then filters it for text nodes, which are wrapped in paragraph tags. This...
Read more >cheeriojs/cheerio - Gitter
How I can give ability to cheerio to parse selfclosing tags without slash? ... com/questions/53965916/cheerio-text-appends-all-of-the-elements-text-together.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Even though I’m rather new to cheerio, I’ve already been struggling quite a bit with those self-closing tags - and I think now that I can help here.
First, let me state a few assumptions I am making:
npm install cheerio
gets you at the moment, and your example can be reproduced with it.<link />
,<meta />
,<br/>
, and some more<script>
(!). Of course something like<div/>
isn’t allowed either.$.xml
,$.html
and$.text
will be deprecated. b)cheerio.load('<div class="foo"></div>')
will NOT anymore get you a representation only of that fragment, but rather behave like a browser: it’ll put a<!DOCTYPE html><html><head></head><body>...</body></html>
around it. c)instance.html(selector)
,instance.xml(selector)
andinstance.text(selector)
will also be deprecated. So: to get (just) the outerHTML of a particular tag will be somewhat trickier.To be clear: these are assumptions of mine. I am not entirely sure about all of them, particulary those w.r.t. the differences to the upcoming v1.0.0 Please do correct me whereever I might be wrong!
As for 2) and 3), the actual output in the example is valid XML but invalid HTML. So maybe one shouldn’t expect to get a non-self-closing, empty
<div></div>
from.xml()
. But.html()
definitely should NOT return invalid HTML - the more if it was valid in the first place. However, in this example, it makes no difference which one you call; the output’s the same for both:<div/>
.The workaround to enforce empty
<div></div>
rather than self-closing<div/>
(and the like): put an empty text node inside! This can be achieved by$('div').filter((i,e) => !e.children.length).text(''))
The.filter(...)
ensures that we don’t overwrite the innards of any non-empty<div>
.Of course this should be applied to all empty tags, that are not allowed to be self-closing in HTML.
There’s quite a bit more to say about this, e.g. when
<script>
tags must be empty (but not self-closing), or how to actually enforce self-closing tags like<meta .../>
even when they were (incorrectly) either empty (<meta...></meta>
) or simply left open (just<meta...>
)…Also TODO: 4) and 5)
If you have to use
xmlMode
, you can disable self-closing tags using theselfClosingTags
option:Outputs:
<div data-message="msg"></div>
.Note that this will disable parsing self-closing tags as well.