question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode string problem

See original GitHub issue

In one of my unit tests I create a folder “Κάποιος κατάλογος” in a workdir. Then cd into that and read the current work dir. This suprisingly gives me:

x chdir (0 of 1 succeeded) Failure: Expected ‘output/process-specs/currentPath/Κάποιος κατάλογος’ to be ‘output/process-specs/currentPath/Κάποιος κατάλογος’.

When I print both strings char by char I can see why comparison fails. The string managed in JS is stored in non-combined form (NFD?) while the file system returns a composed string (probably NFC). This is relevant for the 2 ά letters. Removing them makes the test succeed.

Obviously I have to normalize the strings, however duktape doesn’t seem to support that. When I call normalize() I get an error. What alternatives do I have?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:42 (27 by maintainers)

github_iconTop GitHub Comments

1reaction
fatcerberuscommented, Sep 5, 2017

I agree, but JS is stuck with a bunch of legacy baggage to avoid “breaking the web”. It sucks, but it is what it is. 😃

That said, JS as a language is slowly getting better support for Unicode proper as time goes on. Unicode character escapes, /u mode for regexps, String.fromCodePoint(), etc.

0reactions
svaaralacommented, Sep 5, 2017

Most C programs deal with strings only based on a pointer and don’t support strings with internal NULs. It may not always be up to the program either: most C libraries don’t deal with strings containing internal NULs.

So, duk_get_string() exists because in such programs it’s more convenient than duk_get_lstring() and declaring+ignoring the length field.

Read more comments on GitHub >

github_iconTop Results From Across the Web

14. Unicode issues
Some important characters have also “alternatives” in Unicode: ... The problem is that addslashes() process byte strings, whereas the result is used by ......
Read more >
How to solve unicode encoding issues - Invivoo
In ths new article, our expert will explain you how to solve unicode encoding issues. If you have any question, don't hesite to...
Read more >
Bytes and Unicode Strings - Problem Solving with Python
Unicode strings are useful because there are many letters and letter-like characters that are not part of the set of letters, numbers, and...
Read more >
Python Unicode Encode Error - Stack Overflow
Try to encode your unicode string as ascii first: unicodeData.encode('ascii', 'ignore'). the 'ignore' part will tell it to just skip those ...
Read more >
JavaScript has a Unicode problem - Mathias Bynens
String. fromCharCode allows you to create a string based on a Unicode code point. But it only works correctly for code points in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found