question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve bidi support

See original GitHub issue

Hello,

We’re considering using CodeMirror as an XML editor in Knora, but the main obstacle for us would be the incomplete bidi support. We’d be interested in helping to improve it, at least by clarifying what doesn’t work and by testing fixes. (I’m an Arabic-speaking developer.)

The first thing I noticed when trying the XML autocomplete demo is that CodeMirror has the classic problem of jumbling text and punctuation in bidirectional text with markup. As the W3C says, in Problems with bidirectional source text in markup:

Unless your editor recognizes markup in source text as not being normal text, the strongly typed letters and punctuation in the markup will appear in places you wouldn’t expect, and sometimes interfere with the order of the content itself…

If you are dealing with content that is predominantly in a right-to-left script, the ideal solution would be a source editor that recognizes markup as a special construct, and protects it to produce a sensible order for the characters in the source text.

Here’s an example. Suppose I take this XML document:

<top>This is a test.<animal name="duck." type="bird">apple</animal></top>

Now I translate the text into Arabic:

  • This is a test: هذا اختبار
  • duck: بطة
  • bird: طائر
  • apple: تفاحة

The result looks like this:

<top>هذا اختبار.<animal name="بطة." type="طائر">تفاحة</animal></top>

CodeMirror displays it the same way:

bidi-jumbled-ltr-context

The problems are:

  • The positions of the words طائر and تفاحة are switched, the quotation mark (") after تفاحة is in the wrong place, and the right angle bracket (>) at the end of the animal tag is displayed as a left angle bracket.
  • Each of the full stops (.) after هذا اختبار and بطة should be to the left of the preceding text.

This happens because the Unicode bidi algorithm has incorrectly identified a sequence of characters containing punctuation as a run of RTL characters, or as a run of LTR characters.

To solve this problem, it isn’t enough to add the attribute dir="rtl" to the html tag. If you do that, you get:

bidi-jumbled-rtl-context

This replaces the problems above with other problems:

  • The second tag looks like a <name> tag with an animal attribute, rather than an <animal> tag with a name attribute.
  • The slash inside the closing </animal> tag has moved inside the closing </top> tag.
  • The opening tags go from right to left, but the closing tags go from left to right.

Again, these are symptoms of the limitations of the Unicode bidi algorithm.

Fortunately this problem doesn’t seem to be difficult to solve in HTML. This can be done by adding <span> tags with appropriate dir attributes, as suggested in the W3C document Inline markup and bidirectional text in HTML. The following approach seems to work with Chrome version 50 and Firefox version 48:

  • Put a <span dir="ltr">...</span> around each XML tag.
  • Put a <span dir="rtl">...</span> around any RTL content enclosed by an element or in the value of an attribute.
  • For Firefox only: add a zero-width space (&#8203;) between two <span> elements if nothing else is separating them. (This is necessary only if the overall direction of the HTML document is RTL.)

The example XML document can be rendered correctly using the following HTML:

<span dir="ltr">&lt;top&gt;</span>&#8203;<span dir="rtl">هذا اختبار.</span>&#8203;<span dir="ltr">&lt;animal name="<span dir="rtl">بطة.</span>" type="<span dir="rtl">طائر</span>"&gt;</span>&#8203;<span dir="rtl">تفاحة</span>&#8203;<span dir="ltr">&lt;/animal&gt;</span>&#8203;<span dir="ltr">&lt;/top&gt;</span>

This works regardless of whether the overall direction of the HTML document is LTR or RTL. In an LTR context, the tags are ordered from left to right, the angle brackets are correct, and all the RTL content is displayed correctly and in the right places:

bidi-ltr-context

In an RTL context, the tags are ordered from right to left, and everything else is still correct:

bidi-rtl-context

Here’s a plain HTML page illustrating the problem and the proposed solution.

Does it seem feasible to implement this solution in CodeMirror? If so, how can we help?

Issue Analytics

  • State:open
  • Created 7 years ago
  • Reactions:2
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
m2-farzancommented, Jun 7, 2020

I want to emphasize the point that @ahangarha is trying to make. Situations in which both RTL & LTR paragraphs exist in the same text container are pretty common. A simple example would be a Jupyter notebook markdown cell, which contains text paragraphs in an RTL language, as well as a latex equation block which should be displayed in LTR.

HTML5 addressed this problem with dir=auto attribute, which detects the direction of each paragraph based on the direction of its first character. Applying this attribute to our cm-line objects seems to do the trick just fine but if I understand correctly, @marijnh mentions that we need a way to get some feedback and know which direction has browser assigned to each cm-line. Unfortunately I couldn’t find a way to do that.

But what if we detected the direction ourselves, instead of using dir=auto? We just have to check the first character in each paragraph, and a gist (here) already shows that it’s not that hard.

I’m exited to work on it as I’m frustrated with the way bidi text is handled in Jupyter notebook.

2reactions
ahangarhacommented, May 9, 2020

I have another issue regarding bidi. I see CodeMirror has a page related to bidi which as per my understanding is not really bidi. Bidi is to deal with texts which can be either RTL or LTR and then, provide some solution with which, browser can handle and show the text in right direction.

As per my experience, adding dir="auto" to all elements that can contain text in RTL or LTR would solve the problem very much effectively.

This is the result of tweak I have made on your site: image

By the way, I have made a Firefox add-on called Add Bidi Support to somehow apply what I mean in pages. There are some screenshot of its impact on pages. Take a look.

Should I keep this suggestion here or I should open another issue with almost same title?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bidi Support on the Web - W3C
The tutorial will provide you with an understanding of key requirements for implementing writing systems in information technology.
Read more >
Improve bidi support · Issue #4006 · codemirror/codemirror5
Hello, We're considering using CodeMirror as an XML editor in Knora, but the main obstacle for us would be the incomplete bidi support....
Read more >
Improve bidi support (a better support for RTL)
The solution is to add bidi support. So far, Mastodon uses a simple algorithm to handle this issue. The algorithm checks the content...
Read more >
BiDi support in WebKit - The Chromium Projects
BiDi support in WebKit ... Objective: Improve RTL handling in webkit editing and layout/rendering. ... Lack of clear spec on BiDi editing behavior....
Read more >
Add Bidi Support – Get this Extension for 🦊 Firefox (en-US)
Download Add Bidi Support for Firefox. This extension applies some modification in websites in order to add bidirectional support to them.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found