parse5 is about half the performance of htmlparser2
See original GitHub issueI’m running a benchmark that parses MDN and GitHub using _useHtmlParser2
true and false and I’m getting considerably faster times using htmlParser2.
Is there ongoing work you need help with this? I don’t feel great passing _useHtml5Parser: true
The benchmark is literally loading MDN and all (html) subresources or a GitHub issue and all (html) subresources with Cheerio.
How do I work on this?
Thanks for the great library!
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (3 by maintainers)
Top Results From Across the Web
Htmlparser2: The Fast & forgiving HTML and XML Parser
Htmlparser2 : The Fast & forgiving HTML and XML Parser. htmlparser2 (npm). Fast & forgiving HTML/XML parser. Lifted. Income Estimate: $86.56/month.
Read more >htmlparser2 | The fast & forgiving HTML and XML parser
The fast & forgiving HTML/XML parser. htmlparser2 is the fastest HTML parser, and takes some shortcuts to get there. If you need strict...
Read more >htmlparser2 - Source code - Greasy Fork
... https://greasyfork.org/scripts/37279-htmlparser2/code/htmlparser2.js?version=242688 ... _parser = parser; }; //Resets the handler back to starting state ...
Read more >dom parser free download - SourceForge
The SAX parser is only a half thousand lines of code. RapidJSON is fast. Its performance can be comparable to strlen(). It also...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m working on JS bindings for https://github.com/cloudflare/lol-html at the moment. Which provides low output latency spec-compliant tokenisation along with CSS-selectors support, but orders of magnitude faster than parse5. Maybe it will be useful for your case.
Hey, both ended up being too slow for my particular need so I ended up parsing myself https://github.com/testimio/mhtml-parser because I only needed very primitive processing and structure rather than creating a whole dom tree.
I used
_useHtmlParser2: true
in my cheerio code at https://github.com/testimio/mhtml-parser/blob/master/src/link-replacer.js#L87 and I still use cheerio for svgs 😃