Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

char_code/2: missing representation error

See original GitHub issue

?- M = 1114110,char_code(C,M), char_code(C,N), M = N.
false.                         % incorrect

When converting a code to characters and back one should always get the very same code.

?- M = 1114110,char_code(C,M), char_code(C,N).
M = 1114110, C = '', N = 65534.            % incorrect

It seems you are only supporting Unicode 1.1. So for other integers that are not character codes a representation_error(character_code) should be issued.

Already perfect:

?- char_code(a, non_integer).
uncaught exception: error(type_error(integer, non_integer), '/'(char_code, 2))

Issue Analytics

State:
Created 5 years ago
Comments:6

Top GitHub Comments

1reaction

ghostcommented, Jul 27, 2018

Not really, get_byte/1 has to do with ISO byte streams and is rather externally, and completly independent of ISO text streams. Nothing to do how strings respectively atoms are internally represented, there is not a single ISO byte stream API that would take an atom as an argument.

Also if you go the route write some ISO text stream and the read some ISO byte stream, you only see what the text encoding does, bot not how strings respectively atoms are internally represented under the hood in your host language.

Obviously from the JavaScript docu, the JavaScript strings use surrogate pair. You would see them when you use as text one specific text encoding, namely the UTF-16 encoding.

U+0000 to U+D7FF and U+E000 to U+FFFF Both UTF-16 and UCS-2 encode code points in this range as single 16-bit code units that are numerically equal to the corresponding code points.

U+10000 to U+10FFFF The top ten bits (a number in the range 0x0000…0x03FF) are added to 0xD800 to give the first 16-bit code unit or high surrogate, which will be in the range 0xD800…0xDBFF. The low ten bits (also in the range 0x0000…0x03FF) are added to 0xDC00 to give the second 16-bit code unit or low surrogate, which will be in the range 0xDC00…0xDFFF.

https://en.wikipedia.org/wiki/UTF-16

But anything that considers bytes, when charAt() already returns a word, is anyway inefficient. From the JavaScript docu you see that charAt() already returns a word. So all you have to do is sometimes combine two words into a single code point.

0reactions

ghostcommented, Oct 28, 2018

I see, you are already having fun with Unicode: https://twitter.com/tau_prolog/status/1056350064560521217

And a new issue https://github.com/jariazavalverde/tau-prolog/issues/56 as a christmas present.

Top Results From Across the Web

Why does the JavaScript String whitespace character &nbsp

That's because the no breaking space (charCode 160) does not exactly equal to space (charCode 32). jquery's .text() encodes HTML entities to ...

How to trace LaTeX errors efficiently? - TeX

For now, while there's no such tool, my method (to determine where the error occur) consist of using \errorcontextlines to guess as good...

String.fromCharCode() - JavaScript - MDN Web Docs

The static String.fromCharCode() method returns a string created from the specified sequence of UTF-16 code units.

Syntax error about using `?` to get the Unicode code of a ...

It is resolved during parsing (before compiling) and has no concept of variables. If you want to get the character for a hex...

Project functions for custom fields in Project desktop

Returns a String containing the character associated with the specified character code. Syntax. Chr( charcode ). charcode A Long that identifies a character....