question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BiDi bug with zero width non joiner

See original GitHub issue

مَتْن‌وَنِوِشْتِه

This string contains U+200C (ZERO WIDTH NON-JOINER), but the bidi algorithm strips it, causing wrong rendering:

image

Disabling the bidi algorithm fixes the rendering: image

@typoman: is my conclusion correct in that the second screenshot is correct, and the first isn’t?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

4reactions
khaledhosnycommented, Mar 1, 2020

Looks like python-bidi will need to be modified to fix the original issue anyway, as it removes Boundary Neutral characters unconditionally.

I tried to play with the code, but this seems to be more involved than I thought it would be. So, here is how I’d do the segmentation:

  1. Resolve script for characters with Inherit and Common script property (see https://unicode.org/reports/tr24/#Common):
    • Inherit (usually combining marks) take the script of the preceding character.
    • Common also take the script of preceding character, additionally paired characters like brackets might be better if they take the same script e.g. latin (ARABIC) latin, the brackets should both take Latin script instead of the first taking Latin and the 2nd taking Arabic.
    • Any remaining unresolved characters at the start of the string take the script of the next resolved character.
  2. Run bidi algorithm, segment into runs based on bidi level, and reorder the runs (but not the characters inside the each run). Run direction depends on its bidi level (odd RTL, even LTR).
  3. Split the text into same script and direction, pass this to HarfBuzz. Text passed to HarfBuzz should always be in logical order.
1reaction
justvanrossumcommented, Mar 8, 2020

I’ve implemented @khaledhosny’s scheme, more or less.

Regarding paired characters: I made “opening” chars (mirrored chars with category Ps) look back, and “closing” chars (mirrored chars with category Pe) look forward.

I considered doing pair matching, but it felt weird to do that level of text parsing to get script info. I ignored Pi and Pf as it’s less clear which is opening and which is closing.

I ended up with a relatively simple scheme, that seems to match CoreText, at least in the examples that I tried. I bet it is far from perfect, but wow, BiDi is complex, and segmenting, too 😃

I still use python-bidi to get bidi levels, but I don’t use its reordering anymore.

Read more comments on GitHub >

github_iconTop Results From Across the Web

T34717 Question: Bidi overrides and Unicode spaces removal from ...
Question: Bidi overrides and Unicode spaces removal from titles: why not zero-width space and horizontal tab? Closed, InvalidPublic. Actions.
Read more >
bug#28339: 25.2; Emacs shows ZWNJ character (Zero Width non ...
I've tested the 'glyphless-char-display-control' and I can confirm that as It seems, it's working on English input as expected. For example choosing to...
Read more >
Zero-width non-joiner - Wikipedia
The zero-width non-joiner (ZWNJ) (‌) is a non-printing character used in the computerization of writing systems that make use of ligatures.
Read more >
ECMA-262 Edition 3 specifies ignoring ZWNJ and ZWJ along ...
Joiners As described under X9, the Zero Width Joiner and Non Joiner affect the shaping of the adjacent characters葉hose that are adjacent in...
Read more >
68047 – ZWJ: The zero width joiner shouldn't be filtered out
The zero width joiner and other such zero width chars shouldn't be filtered out by vcl I believe. Attached is a screenshot of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found