BiDi bug with zero width non joiner
See original GitHub issueمَتْنوَنِوِشْتِه
This string contains U+200C (ZERO WIDTH NON-JOINER), but the bidi algorithm strips it, causing wrong rendering:
Disabling the bidi algorithm fixes the rendering:
@typoman: is my conclusion correct in that the second screenshot is correct, and the first isn’t?
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (9 by maintainers)
Top Results From Across the Web
T34717 Question: Bidi overrides and Unicode spaces removal from ...
Question: Bidi overrides and Unicode spaces removal from titles: why not zero-width space and horizontal tab? Closed, InvalidPublic. Actions.
Read more >bug#28339: 25.2; Emacs shows ZWNJ character (Zero Width non ...
I've tested the 'glyphless-char-display-control' and I can confirm that as It seems, it's working on English input as expected. For example choosing to...
Read more >Zero-width non-joiner - Wikipedia
The zero-width non-joiner (ZWNJ) () is a non-printing character used in the computerization of writing systems that make use of ligatures.
Read more >ECMA-262 Edition 3 specifies ignoring ZWNJ and ZWJ along ...
Joiners As described under X9, the Zero Width Joiner and Non Joiner affect the shaping of the adjacent characters葉hose that are adjacent in...
Read more >68047 – ZWJ: The zero width joiner shouldn't be filtered out
The zero width joiner and other such zero width chars shouldn't be filtered out by vcl I believe. Attached is a screenshot of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Looks like python-bidi will need to be modified to fix the original issue anyway, as it removes Boundary Neutral characters unconditionally.
I tried to play with the code, but this seems to be more involved than I thought it would be. So, here is how I’d do the segmentation:
I’ve implemented @khaledhosny’s scheme, more or less.
Regarding paired characters: I made “opening” chars (mirrored chars with category
Ps
) look back, and “closing” chars (mirrored chars with categoryPe
) look forward.I considered doing pair matching, but it felt weird to do that level of text parsing to get script info. I ignored
Pi
andPf
as it’s less clear which is opening and which is closing.I ended up with a relatively simple scheme, that seems to match CoreText, at least in the examples that I tried. I bet it is far from perfect, but wow, BiDi is complex, and segmenting, too 😃
I still use python-bidi to get bidi levels, but I don’t use its reordering anymore.