question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cheerio produces self-closing tags when element content is empty

See original GitHub issue

Hi Cheerio Team

I’m trying to solve the issue with self-closing tags. This code:

const cheerio = require('cheerio')
const $ = cheerio.load('<div data-message="msg"></div>',
  {
    xmlMode: true,
    decodeEntities: false,
  })

console.log($.xml())

Outputs: <div data-message="msg"/>

But I need the following output: <div data-message="msg"></div> Also I have to keep cheerio options xmlMode: true and decodeEntities: false

Is there any way to tell cheerio to close tags in such cases?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:7
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

5reactions
meislcommented, Oct 19, 2018

Even though I’m rather new to cheerio, I’ve already been struggling quite a bit with those self-closing tags - and I think now that I can help here.

First, let me state a few assumptions I am making:

  1. The exact version used should be stated. I think we’re talking v1.0.0-rc.2 here, since that’s what npm install cheerio gets you at the moment, and your example can be reproduced with it.
  2. XML allows any tag to be self-closing.
  3. By contrast, HTML only allows a few tags to be self-closing:
    • allowed: <link />, <meta />, <br/>, and some more
    • NOT allowed are really most of the tags. Particularly <script> (!). Of course something like <div/> isn’t allowed either.
  4. As I understand it, the API will change substantially with v1.0.0 (NOT yet in v1.0.0-rc.2). In particular: a) the static methods $.xml, $.html and $.text will be deprecated. b) cheerio.load('<div class="foo"></div>') will NOT anymore get you a representation only of that fragment, but rather behave like a browser: it’ll put a <!DOCTYPE html><html><head></head><body>...</body></html> around it. c) instance.html(selector), instance.xml(selector) and instance.text(selector) will also be deprecated. So: to get (just) the outerHTML of a particular tag will be somewhat trickier.
  5. There seems to be a number of issues revolving around the problem at hand - they should cross-reference each other, IMHO.

To be clear: these are assumptions of mine. I am not entirely sure about all of them, particulary those w.r.t. the differences to the upcoming v1.0.0 Please do correct me whereever I might be wrong!


As for 2) and 3), the actual output in the example is valid XML but invalid HTML. So maybe one shouldn’t expect to get a non-self-closing, empty <div></div> from .xml(). But .html() definitely should NOT return invalid HTML - the more if it was valid in the first place. However, in this example, it makes no difference which one you call; the output’s the same for both: <div/>.

The workaround to enforce empty <div></div> rather than self-closing <div/> (and the like): put an empty text node inside! This can be achieved by $('div').filter((i,e) => !e.children.length).text('')) The .filter(...) ensures that we don’t overwrite the innards of any non-empty <div>.

Of course this should be applied to all empty tags, that are not allowed to be self-closing in HTML.


There’s quite a bit more to say about this, e.g. when <script> tags must be empty (but not self-closing), or how to actually enforce self-closing tags like <meta .../> even when they were (incorrectly) either empty (<meta...></meta>) or simply left open (just <meta...>)…

Also TODO: 4) and 5)

3reactions
fb55commented, Dec 21, 2020

If you have to use xmlMode, you can disable self-closing tags using the selfClosingTags option:

const $ = cheerio.load('<div data-message="msg"></div>',
  {
    xmlMode: true,
    decodeEntities: false,
    selfClosingTags: false,
  })

console.log($.xml())

Outputs: <div data-message="msg"></div>.

Note that this will disable parsing self-closing tags as well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Self closing tag space issue in xml - Cheerio - Stack Overflow
I think your best option in that case is to do a commit that rationalizes the data (makes it consistently have the space...
Read more >
Cheerio produces self-closing tags when element content is ...
I'm trying to solve the issue with self-closing tags. This code: const cheerio = require('cheerio') const $ = cheerio.load('<div data ...
Read more >
Void element - MDN Web Docs Glossary - Mozilla
In such cases, if an element's start tag is marked as self-closing, the element must not have an end tag.
Read more >
.contents() | jQuery API Documentation
This code first retrieves the contents of <div class="container"> and then filters it for text nodes, which are wrapped in paragraph tags. This...
Read more >
cheeriojs/cheerio - Gitter
How I can give ability to cheerio to parse selfclosing tags without slash? ... com/questions/53965916/cheerio-text-appends-all-of-the-elements-text-together.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found