Discuss: File size of distributable for the browser
See original GitHub issueI wanted to open a discussion about the direction we’re headed with regard to file size. It’s come up recently in talking to StackOverflow, but otherwise I don’t hear much about it.
First some history
In terms of “absolute” relative size we’ve grown quite a bit in the past year or so. All numbers are stated in terms of gzipped size.
- When I came on board in Sept 2019: ~20kb
:common
subset, gzipped - Today: ~37kb
:common
gzipped
We’ve close to doubled the size of our common distributable. Yet in that time we also added several new languages to :common
also:
- Go
- Kotlin
- Less and SCSS
- Lua
- Rust
- Swift
- Typescript
33% of our size increase comes from just these new languages (i.e. ~20kb to 28kb). The rest comes from numerous grammar improvements, parser improvements, etc…
Here and Now / My Thoughts
We have our new “higher fidelity” initiative in #2500. Both 37kb and 20kb seem tiny to me. Yes, it’s possible to build much larger builds. The full library (with every grammars) weighs in at a whopping 272kb.
All the feedback I see here on issues is of the “please, better highlighting, highlight more, highlight better” variety… I can’t remember anyone pushing back with “the library is too large, make it smaller”. I wanted to open the topic to see if anyone has any thoughts on this.
Personally I feel that our situation now is good and that increasing the bundle even 30-40% would be a win if we end up with much more nuanced highlighting at a result. (I don’t think the size will actually increase that much though.) I don’t see how we can keep the size the same as we pursue higher fidelity and more nuanced highlighting. Many of the recent “language reboots” (LaTex, Mathematica, etc) have seen huge improvements in those grammars - but also a significant increase in the grammar size.
I still think a very “popular” use for Highlight.js is on a small website/blog where one is using a subset of the languages, not a full build (or anything even close). Then on the other end you have huge sites like Discourse and StackOverflow building larger bundles. In those cases I think the right solution (if size becomes a problem) is to lazy-load the grammars on demand. Which we’ve always made easy to do and it just got easier with my PR to eliminate all run-time dependencies between languages.
A good portion of our size is keyword bundles… There has been talk recently of whether (in some languages) we could detect CamelCaseClassThingy
rather than a hard-coded list… and while we could do that it (removing some keywords) it would have detrimental effects on our auto-detection capabilities which for many languages is highly dependent on large keyword lists.
Also, there are other highlighters. It’s always been my advice that if “small size” is a key requirement for someone that Prism might be a better choice as they tend to rely a lot more on tighter regex, simpler grammars, and dependency stacking… which helps them keep the size of each grammar smaller.
So I see our library continuing to grow slowly in size with every new release… and continuing to highlight with more nuance… with of course continued improvements to the parser and auto-detect when possible.
- Does anything think this is the wrong direction?
- Should we have some sort of size cap on 1st party languages?
- Any other thoughts?
Note: Currently every language is built as a stand-alone module - which hurts our non-compressed size - since some dependency modules end up being duplicated in the source… this should have less bearing on the final gzip size though and there are also plans to fix this in the future (when using the official build system to build a monolithic distributable).
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:8 (8 by maintainers)
I wasn’t referring entirely to actual real-life webpages or use cases - but rather implimentor behavior, including the possibility of confusion and mistakes.
So it seems the actual need here is for a tiny fraction of use cases:
Given all that, I feel right now lazy-loading is best handled outside of core.
This issue was originally created following some discussions with Stack Overflow, who are extremely size sensitive. It took 3 months before anyone else chimed in on the topic. I just don’t think many people actually need this functionality (or care about size super strongly) - and even if I’m mistaken, I don’t (at this time) see huge advantages to it being in core vs a plug-in/add-on.
It seems (esp. after we release an ESM npm package) one could very easily write a small “wrapper” package such as
highlightjs-async
that provided a customindex
(with addl. metadata and async registration calls) and then replace/wrap key API functions with async versions:I’d suggest this is even quite possible today without much effort using
fetch
instead of modules.I have a take on this, but it might be ignorant on how people are actually using highlight.js, but here it is: Recommend using ESM and stop caring about the bundled size. If we use a promise-based approach, we could use dynamic import (https://caniuse.com/es6-module-dynamic-import) to lazy load only language parsers the user is actually using and have a pretty clean API:
Using this approach, only IE users would suffer from the bundle size.
We would have to make some change in the underlining API to deal with
Promise
andimport()
:I might be wrong, but I am under the impression that most highlight.js use a subset of the languages it provides (and sometimes only one). If we are moving to ESM on v11, I think we should do the extra work of having a Promise-based API, users would benefit from it.