Auto-detect language per-line is guaranteed to produce poor results
See original GitHub issueHey, current maintainer of Highlight.js here. This just came to my attention via #391.
You’re looping over lines and then calling highlightAuto
on every line (when you don’t have a known language). This is not recommended and guaranteed to produce poor results. Auto-detect is not intended to be useful with such little data and the noise will often (as reported in #391) be much higher than the signal - you’re just as likely to get random languages than anything useful. There will be color, but often all wrong.
If you do wish to use auto-detect you should pass us the ENTIRE document (or at the very least all the available lines from the document/diff), then look at the language we determine it to be, then use that language for every single line.
You’ll have to take this approach with version 11 anyways since you’ll have to do the highlighting in a single pass (rather than per-line). So calling highlightAuto
upfront for all available lines and letting it use the greater amount of content available for it’s auto-detection… then splitting that result back out into the individual lines you need - already highlighted for you.
You’ll have to do it twice of source, once each for the before and after streams.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
That’s not really what you want to do as it will break on any scopes that persist past the end of a line boundary. What you’d really want to do:
You’d really need to do this for each section of a diff (if they are non-sequential). So if a diff included 3 discrete changes, ~10 lines each then you’d be grouping each of those 3 changes into blocks and then highlighting all 3 blocks. Then splitting them apart again to get at the individual highlighted lines.
@iHiD thanks for the reference. Will definitely read it.