question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal for breaking change: removing the `root` element

See original GitHub issue

Background

Cheerio’s load method accepts a string of markup and creates a selector function. This function is “bound” to a document whose contents is a node structure based on the input markup. This function is intended to behave much like the global jQuery/$ function provided by the jQuery library (as if the library had been loaded in a document generated from the same markup by a web browser).

Historically, Cheerio has always attached the parsed markup to a non-standard “root” element (i.e. <root>). As far as I know, this was implemented to support the load method’s behavior when given markup describing a document fragment–strings like '<p>1</p><p>2</p>' could be passed to load and still produce a single top-level element.

Thanks to @inikulin and his Parse5 library, the release candidate for version 1.0.0 normalizes the parsing behavior of load. It always produces a complete document–just a like a web browser. The result is much more predictable, standards-compliant, and “familiar” to web developers. It’s a backwards-breaking change, though, and we hope to ease the upgrade path for consumers through a concise migration guide.

Proposal

Since we are already committed to a breaking change for version 1.0, I wanted to consider making Cheerio’s behavior even more browser-like. I’d like to get rid of the <root> element and instead rely on the <html> element (which again, is either described by the consumer’s input markup or automatically created by Parse5).

This would involve removing the Cheerio-specific $.root() method from the API. Users who previously used it as a basis for traversal could re-write code like $.root().find('div') with $('html').find('div').

However, many use cases involve rendering full documents. For this, $('html').html() is not equivalent because the resultant string does not include the document element itself. So we’ll need to keep another Cheerio-specific method: $.html().

Other than that, I think it would just be matter of updating Cheerio’s internals to operate without <root>. I spent just enough time trying this out to see that it is not trivial, but I don’t believe that there is any technical reason it can’t be done.

I’d love to get feedback from any Cheerio user, but I’m particularly interested in hearing from @fb55, @matthewmueller, and @inikulin. Do you think this is a good idea? Do you think it would invalidate any existing use cases? Or is there any other reason it isn’t technically possible? Or more subjectively, do you think the change would be too jarring for consumers to justify the benefit?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jugglinmikecommented, Sep 24, 2017

Thanks for the feedback, @inikulin! I wanted to push for a more standards-compliant API if only to make the internals more familiar for contributors. After experimenting some more, though, I’ve come to realize that whatever the case, jQuery (and by extension Cheerio) does not offer an API for working with the owner document.

One day, it would be nice if we could more concretely document and support direct interaction with Cheerio’s DOM. I’m not sure if this is a realistic goal, though: if the DOM is implemented as a static data structure, then it will always be dangerous to encourage end-user manipulation–many modifications could invalidate the document and cause instability in Cheerio’s behavior.

So until then, we’ll always need a Cheerio-specific API to support this use case. We might make changes to the underlying structure (for instance, using a “true” document node as opposed to a Element with tag name “root”), but that will be a implementation detail that we introduce essentially just for the sake of conformance; it won’t effect the API. In other words: I don’t have to block version 1.0 with my pedantic objections 😃

0reactions
ljharbcommented, Sep 25, 2017

@jugglinmike does this mean that some form of .root() will end up in v1?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Initial proposal of breaking changes in 3.0 · Issue #2 · vuejs/core
No longer resolves component by string names; Any h call with a string is considered an element. Components must be resolved before being...
Read more >
Vue 3 — New features, Breaking changes & a Migration path
In Vue 3, components now can have multiple root nodes. This enables eliminating wrapper elements and writing cleaner markup.
Read more >
CSS Display Module Level 3 - W3C
Abstract. This module describes how the CSS formatting box tree is generated from the document element tree and defines the display property ...
Read more >
Release Notes | Internet Computer Home
New feature (breaking change): Remove the wallet proxy and the --no-wallet flag​. Breaking change: Canister commands, except for dfx canister ...
Read more >
Smooth and simple transitions with the View Transitions API
The View Transition API makes it easy to change the DOM in a single ... done however you want: Add/removing elements, changing class...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found