fix empty cell representation
See original GitHub issueCurrently an empty cell in the buffer is represented by a single white space, which leads to several issues for any functionality operating on the string representations. At least affected by this are:
- selection manager
- linkifier
- search addon
- prolly copy & paste (not tested)
- reflow resize, once we support this
Most of these rely on Buffer.translateBufferLineToString
, that tries to deal with empty cells with the trimRight
flag. Still there are circumstances where it cannot be decided at buffer level, how to create the correct string represenation, see https://github.com/xtermjs/xterm.js/issues/791#issuecomment-403284022.
Some workarounds are in place that try to fix those wrongly gathered string data by peeking again into the buffer, others will simply break at the edge cases (esp. the last right border cell thing is really nasty).
To get a more uniform handling of empty cells without the need of quirky patches here and there I suggest to define an empty cell to be a value that cannot be part of buffer string by normal means - any of the control chars would do (since the input handler will filter/trigger actions for control chars those will not end up as cell content values in the buffer). Imho the “hottest” candidate is the null byte ‘\x00’ for several reasons:
- a 0 value kinda implies there is nothing
- easy translation with upcoming content pointer (where integer value stands for the UTF32 value, ergo 0 translates to ‘\u0000’)
Why not simply use an empty string or null
for an empty cell? Imho this would complicate things even further, since empty cells between others would “collapse” in a JS string while a placeholder can preserve the cell padding (There is another reason - we have to cover a third state and use this for cells after fullwidth chars, that would return an empty string). Imagine this input:
- string ‘ab’
- cursor move one right
- string ‘c’
- cursor move one right
- string ’ ’ (one whitespace)
leads to these buffer states:
current
['a', 'b', ' ', 'd', ' ', ' ', ' '] ==> 'ab d ' // trim would cut input whitespace
vs. ''
['a', 'b', '', 'd', '', ' ', ''] ==> 'abd ' // collapsed, padding broken
vs. '\x00'
['a', 'b', '\x00', 'd', '\x00', ' ', '\x00'] ==> 'ab\x00d\x00 \x00'
The third string resembles the buffer state better than the others. The right border problem now can be solved by simply trimming ‘\x00’ which correctly leads to
'ab\x00d\x00 '
In a last step the placeholder could be replaced by whatever is needed for further processing (most likely with whitespace).
Up for discussion. There might be other representation tricks to leverage fast built in string methods with the fullwidth chars too. Also it might have a negative impact on the renderer speed due to an additional check against the placeholder.
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (9 by maintainers)
Top GitHub Comments
Someone not knowing anything about xterm.js’s internals chiming in:
When the output is “ab” followed by cursor right movements followed by “c” (or “d”, the opening post is inconsistent here, nevermind), the desired behavior for all kinds of operations like copy-n-paste, autolocating URLs, searching etc. are to treat it as “ab[spaces]c” and not “abc”, correct? That is, those “unused” cells have to be treated as spaces. This kind of output is easily possible with screen drawing libraries like ncurses which don’t make a difference between spaces and unused cells.
Another typical example is when the output is “ab[tab]c”. Most terminal emulators (not sure about xterm.js) treat TAB as a cursor moving operation only, preserving the characters and their attributes that it skips over. Again, for copy-pasting and such, you wish to get “ab[spaces]c”, or even better “ab[tab]c” if you have some extra special magic to preserve tabs, but definitely not “abc”.
With this in mind, if we’re on the same track so far, can you think of any example where a line would contain an “unused” (‘\x00’, or whatever you pick) cell followed by a “used” one? We in VTE cannot. If you agree, probably the cleanest approach is to start with a new per-row variable containing the number of used cells (aka the offset where the unused ones begin). This can be maintained in O(1).
With the help of this variable, you may on may not still put spaces in the used part and 0’s in the unused. If you want to, you can probably do it in O(1) average time given the average use of the terminal, just adjust the cells between the old and new value of this variable. O(n) would be the worst case, I wouldn’t worry too much about it. Or you can maybe even have spaces everywhere and keep this operation O(1) all the time.
(Probably VTE isn’t ideal here either. We do have this per-row “used length” variable, but we also tend to have spaces in used cells and '\x00’s in unused ones. It might even be more messy than this. I’m not that super familiar with this part of our code. And on a side note, “unused” cells still have their background color tracked.)
isWrapped (assuming it is what I believe it is 😃) should be kept separate from whether or not there are trailing unused cells in a line, these two are orthogonal. All 4 possibilities are (or should be) possible. 3 of them are trivial. A line can have just a few characters (after which unused cells remain) and then an explicit newline. It can have exactly as many characters as the width (no unused cells) and then an explicit newline. Or it can overflow to the next line (no unused cells, isWrapped).
The 4th, nontrivial case, assuming 80 columns: print 79 English letters followed by a CJK. There’s no room to place it at the end of the line, hence an unused cell remains there, but the line isWrapped to the next one, beginning with this CJK (is this xterm.js’s behavior?). That empty cell should be ignored for copy-pasting, searching etc. actions, and possibly re-filled (and similarly unused fields appearing newly) once you support reflow on resize.
With #1775 we now have an explicit null char representation. Thanks to all for the valuable input.