Ignore soft hyphens, word breaks & zero-width stuff
See original GitHub issuePlease add support for highlighting content that includes a soft hyphen (­
), a word break opportunity HTML tag (<wbr>
) or a zero-width space (​
).
In this demo, I added a ­
, <wbr>
and ​
at various locations inside of the word “Lorem”. And when using mark.js to highlight “Lorem”, only two out of the five results are marked. Performing a browser search reveals all five matches.
I believe this applies to all browsers, but I did test and got the same results in the latest versions of Chrome & Firefox (Windows 10) with mark.js version (8.1.1).
References:
- http://www.quirksmode.org/oddsandends/wbr.html
- https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr
- http://www.fileformat.info/info/unicode/char/200b/index.htm
I also found that zero-width non-joiner (‌
) and zero-width joiner (‍
) also produces the same result. Odd that they aren’t discussed as much… I guess because these last two are joiners don’t appear work with css hyphens - demo modified from this page.
Issue Analytics
- State:
- Created 7 years ago
- Comments:8 (3 by maintainers)
Top GitHub Comments
Ok, I think I’m going to go with
ignoreJoiners
because my other idea of usingkillWhitey
didn’t seem too appropriate either.I’ve started working on tests, and I have an odd question - why does the across elements filter file not have any elements to span across?
I’m all about weird stuff. Like me! 😀
I’ve already got something working. The regular expression starts to look really ugly, so I think disabling this option by default would be preferred.
So what would you want to name this option? I started out with
ignoreWhitespace
but that isn’t accurate enough. It only applies to soft hyphens and some zero-width characters.