BiDi bug with zero width non joinerSee original GitHub issue
This string contains U+200C (ZERO WIDTH NON-JOINER), but the bidi algorithm strips it, causing wrong rendering:
Disabling the bidi algorithm fixes the rendering:
@typoman: is my conclusion correct in that the second screenshot is correct, and the first isn’t?
- Created 4 years ago
- Comments:11 (9 by maintainers)
Top GitHub Comments
Looks like python-bidi will need to be modified to fix the original issue anyway, as it removes Boundary Neutral characters unconditionally.
I tried to play with the code, but this seems to be more involved than I thought it would be. So, here is how I’d do the segmentation:
- Resolve script for characters with Inherit and Common script property (see https://unicode.org/reports/tr24/#Common):
- Inherit (usually combining marks) take the script of the preceding character.
- Common also take the script of preceding character, additionally paired characters like brackets might be better if they take the same script e.g. latin (ARABIC) latin, the brackets should both take Latin script instead of the first taking Latin and the 2nd taking Arabic.
- Any remaining unresolved characters at the start of the string take the script of the next resolved character.
- Run bidi algorithm, segment into runs based on bidi level, and reorder the runs (but not the characters inside the each run). Run direction depends on its bidi level (odd RTL, even LTR).
- Split the text into same script and direction, pass this to HarfBuzz. Text passed to HarfBuzz should always be in logical order.
I’ve implemented @khaledhosny’s scheme, more or less.
Regarding paired characters: I made “opening” chars (mirrored chars with category
Ps) look back, and “closing” chars (mirrored chars with category
Pe) look forward.
I considered doing pair matching, but it felt weird to do that level of text parsing to get script info. I ignored
Pf as it’s less clear which is opening and which is closing.
I ended up with a relatively simple scheme, that seems to match CoreText, at least in the examples that I tried. I bet it is far from perfect, but wow, BiDi is complex, and segmenting, too 😃
I still use python-bidi to get bidi levels, but I don’t use its reordering anymore.