Unicode string problem
See original GitHub issueIn one of my unit tests I create a folder “Κάποιος κατάλογος” in a workdir. Then cd into that and read the current work dir. This suprisingly gives me:
x chdir (0 of 1 succeeded) Failure: Expected ‘output/process-specs/currentPath/Κάποιος κατάλογος’ to be ‘output/process-specs/currentPath/Κάποιος κατάλογος’.
When I print both strings char by char I can see why comparison fails. The string managed in JS is stored in non-combined form (NFD?) while the file system returns a composed string (probably NFC). This is relevant for the 2 ά letters. Removing them makes the test succeed.
Obviously I have to normalize the strings, however duktape doesn’t seem to support that. When I call normalize() I get an error. What alternatives do I have?
Issue Analytics
- State:
- Created 6 years ago
- Comments:42 (27 by maintainers)
Top Results From Across the Web
14. Unicode issues
Some important characters have also “alternatives” in Unicode: ... The problem is that addslashes() process byte strings, whereas the result is used by ......
Read more >How to solve unicode encoding issues - Invivoo
In ths new article, our expert will explain you how to solve unicode encoding issues. If you have any question, don't hesite to...
Read more >Bytes and Unicode Strings - Problem Solving with Python
Unicode strings are useful because there are many letters and letter-like characters that are not part of the set of letters, numbers, and...
Read more >Python Unicode Encode Error - Stack Overflow
Try to encode your unicode string as ascii first: unicodeData.encode('ascii', 'ignore'). the 'ignore' part will tell it to just skip those ...
Read more >JavaScript has a Unicode problem - Mathias Bynens
String. fromCharCode allows you to create a string based on a Unicode code point. But it only works correctly for code points in...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I agree, but JS is stuck with a bunch of legacy baggage to avoid “breaking the web”. It sucks, but it is what it is. 😃
That said, JS as a language is slowly getting better support for Unicode proper as time goes on. Unicode character escapes,
/umode for regexps,String.fromCodePoint(), etc.Most C programs deal with strings only based on a pointer and don’t support strings with internal NULs. It may not always be up to the program either: most C libraries don’t deal with strings containing internal NULs.
So, duk_get_string() exists because in such programs it’s more convenient than duk_get_lstring() and declaring+ignoring the length field.