Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode string problem

See original GitHub issue

In one of my unit tests I create a folder “Κάποιος κατάλογος” in a workdir. Then cd into that and read the current work dir. This suprisingly gives me:

x chdir (0 of 1 succeeded) Failure: Expected ‘output/process-specs/currentPath/Κάποιος κατάλογος’ to be ‘output/process-specs/currentPath/Κάποιος κατάλογος’.

When I print both strings char by char I can see why comparison fails. The string managed in JS is stored in non-combined form (NFD?) while the file system returns a composed string (probably NFC). This is relevant for the 2 ά letters. Removing them makes the test succeed.

Obviously I have to normalize the strings, however duktape doesn’t seem to support that. When I call normalize() I get an error. What alternatives do I have?

Issue Analytics

State:
Created 6 years ago
Comments:42 (27 by maintainers)

Top GitHub Comments

1reaction

fatcerberuscommented, Sep 5, 2017

I agree, but JS is stuck with a bunch of legacy baggage to avoid “breaking the web”. It sucks, but it is what it is. 😃

That said, JS as a language is slowly getting better support for Unicode proper as time goes on. Unicode character escapes, /u mode for regexps, String.fromCodePoint(), etc.

0reactions

svaaralacommented, Sep 5, 2017

Most C programs deal with strings only based on a pointer and don’t support strings with internal NULs. It may not always be up to the program either: most C libraries don’t deal with strings containing internal NULs.

So, duk_get_string() exists because in such programs it’s more convenient than duk_get_lstring() and declaring+ignoring the length field.

Top Results From Across the Web

14. Unicode issues

Some important characters have also “alternatives” in Unicode: ... The problem is that addslashes() process byte strings, whereas the result is used by ......

How to solve unicode encoding issues - Invivoo

In ths new article, our expert will explain you how to solve unicode encoding issues. If you have any question, don't hesite to...

Bytes and Unicode Strings - Problem Solving with Python

Unicode strings are useful because there are many letters and letter-like characters that are not part of the set of letters, numbers, and...

Python Unicode Encode Error - Stack Overflow

Try to encode your unicode string as ascii first: unicodeData.encode('ascii', 'ignore'). the 'ignore' part will tell it to just skip those ...

JavaScript has a Unicode problem - Mathias Bynens

String. fromCharCode allows you to create a string based on a Unicode code point. But it only works correctly for code points in...