Custom tags moved from <head> to <body>
See original GitHub issueI have been using Cheerio for years in AkashaCMS and have been very happy with what I can do with it.
I just upgraded to RC10 and there is a serious problem. In AkashaCMS, I use Cheerio to implement a bunch of custom tags like <ak-stylesheets>
and many more. These are not normal HTML tags, of course, but custom, and I have code which uses Cheerio to find these and replace them with regular HTML.
What’s happening is that custom tags inside the <head>...</head>
section are being moved into the <body>...</body>
section. Then when my code substitutes the custom tags, their content ends up inside <body>
where they do not belong.
What follows is a brief example. My code of course does a lot more, but I narrowed the problem down to this. The example uses cheerio.load
to parse some HTML, then immediately uses $.root().html()
to serialize it back to text.
The behavior is that every element within <head>
following a custom (non-standard) tag is moved into <body>
. As this code stands, the <meta>
tag stays within <head>
but if you move the <funky-bump>
tag before <meta>
, then the <meta>
tag moves into the <body>
.
const cheerio = require('cheerio');
const text = `<!doctype html>
<!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ -->
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8" lang="en"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9" lang="en"> <![endif]-->
<!-- Consider adding a manifest.appcache: h5bp.com/d/Offline -->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en"> <!--<![endif]-->
<head>
<meta charset="utf-8" />
<!-- Use the .htaccess and remove these lines to avoid edge case issues. More info: h5bp.com/i/378 -->
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>Show Content</title>
<meta name="foo" content="bar"/>
<funky-bump></funky-bump>
<ak-stylesheets></ak-stylesheets>
<ak-headerJavaScript></ak-headerJavaScript>
<rss-header-meta href="/rss-for-header.xml"></rss-header-meta>
<external-stylesheet href="http://external.site/foo.css"></external-stylesheet>
<dns-prefetch
control="we must have control"
dnslist="foo1.com,foo2.com,foo3.com"></dns-prefetch>
<site-verification google="We are good"></site-verification>
<xml-sitemap></xml-sitemap>
<xml-sitemap href="/foo-bar-sitemap.xml" title="Foo Bar Sitemap"></xml-sitemap>
</head>
<body>
THE PROBLEM - Stylesheets etc are rendering into the body rather than the head
Immediately before this, published Mahabhuta 0.7.4... did this break something?
<h1>Show Content</h1>
<section id="teaser"><ak-teaser></ak-teaser></section>
<article id="original">
<p><show-content id="simple" href="/shown-content.html"></show-content></p>
<p><show-content id="dest" dest="http://dest.url" href="/shown-content.html"></show-content></p>
<p><show-content id="template"
template="ak_show-content-card.html.ejs"
href="/shown-content.html"
content-image="/imgz/shown-content-image.jpg"
>
Caption text
</show-content></p>
<p><show-content id="template2"
template="ak_show-content-card.html.ejs"
href="/shown-content.html"
dest="http://dest.url"
content-image="/imgz/shown-content-image.jpg"
>
Caption text
</show-content></p>
</article>
<article id="duplicate">
<ak-insert-body-content></ak-insert-body-content>
</article>
<ak-footerJavaScript></ak-footerJavaScript>
</body>
</html>`;
const opts = {
recognizeSelfClosing: true,
recognizeCDATA: true,
decodeEntities: true
};
const $ = cheerio.load(text, opts);
console.log($.root().html());
And the output:
$ node cheeriot.js
<!DOCTYPE html><!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ --><!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en"> <![endif]--><!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8" lang="en"> <![endif]--><!--[if IE 8]> <html class="no-js lt-ie9" lang="en"> <![endif]--><!-- Consider adding a manifest.appcache: h5bp.com/d/Offline --><!--[if gt IE 8]><!--><html class="no-js" lang="en"><!--<![endif]--><head>
<meta charset="utf-8">
<!-- Use the .htaccess and remove these lines to avoid edge case issues. More info: h5bp.com/i/378 -->
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Show Content</title>
<meta name="foo" content="bar">
</head><body><funky-bump></funky-bump>
<ak-stylesheets></ak-stylesheets>
<ak-headerjavascript></ak-headerjavascript>
<rss-header-meta href="/rss-for-header.xml"></rss-header-meta>
<external-stylesheet href="http://external.site/foo.css"></external-stylesheet>
<dns-prefetch control="we must have control" dnslist="foo1.com,foo2.com,foo3.com"></dns-prefetch>
<site-verification google="We are good"></site-verification>
<xml-sitemap></xml-sitemap>
<xml-sitemap href="/foo-bar-sitemap.xml" title="Foo Bar Sitemap"></xml-sitemap>
THE PROBLEM - Stylesheets etc are rendering into the body rather than the head
Immediately before this, published Mahabhuta 0.7.4... did this break something?
<h1>Show Content</h1>
<section id="teaser"><ak-teaser></ak-teaser></section>
<article id="original">
<p><show-content id="simple" href="/shown-content.html"></show-content></p>
<p><show-content id="dest" dest="http://dest.url" href="/shown-content.html"></show-content></p>
<p><show-content id="template" template="ak_show-content-card.html.ejs" href="/shown-content.html" content-image="/imgz/shown-content-image.jpg">
Caption text
</show-content></p>
<p><show-content id="template2" template="ak_show-content-card.html.ejs" href="/shown-content.html" dest="http://dest.url" content-image="/imgz/shown-content-image.jpg">
Caption text
</show-content></p>
</article>
<article id="duplicate">
<ak-insert-body-content></ak-insert-body-content>
</article>
<ak-footerjavascript></ak-footerjavascript>
</body></html>
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Top GitHub Comments
I have a solution. It involves forcing the use of
htmlparser2
but not using the method in the previous comment. That method, copied from your documentation, does not work. Instead, browsing the source code I found this undocumented option:Set this to
true
and the example works as expected. Set it tofalse
and it fails as shown above.Anything but plain HTML will require users to use
htmlparser2
. Agreed that this option should be documented better.