Consider UTF-8?
See original GitHub issueWe are using UTF-16, which has some cross platform compatibility issues, starting with different type used on different platforms - wchar_t
on Windows and char16_t
everywhere else, with former being a fundamental type (not requiring an include), but latter - defined as an unsigned short. This is thanks to 'nix toolchains picking UTF-32 (which is excessive for most uses) as default Unicode type.
UTF-8 is more cross platform, and supported in Windows since Win10. For prior versions of Windows it should be possible to convert to UTF-16 for things that require it. A quick search shows that V8 might use UTF-8.
I am not sure whether or not UTF-8 would be easier for our text-related projects, like RegEx support. @rhuanjl do you have any thoughts?
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
Django MySQL 'utf8' is currently an alias for the character set ...
UTF-8 is what the world outside MySQL calls the Unicode encoding for any number of bytes. utf8 (no dash) is a CHARACTER SET...
Read more >1.9.3 The utf8 Character Set (Alias for utf8mb3)
To avoid ambiguity about the meaning of utf8 , consider specifying utf8mb4 explicitly for character set references. PREV HOME UP NEXT. Related Documentation....
Read more >What is UTF-8 Encoding? A Guide for Non-Programmers
We'll learn the basics of text storage and encoding, and discuss how it helps put engaging words across your site. Before we begin,...
Read more >3719: 'utf8' is currently an alias for the character set UTF8MB3 ...
3719: 'utf8' is currently an alias for the character set UTF8MB3, which will be replaced by ... Please consider using UTF8MB4 in order...
Read more >UTF-8 - Wikipedia
UTF-8 is a variable-length character encoding used for electronic communication. ... UTF-8 ... Consider the encoding of the euro sign, €:.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
IMO this is a really strong argument for it
How to ensure that the observable behaviour matches UTF16 for points defined in the spec is the key challenge though.
One option (if memory efficiency is the aim) would be to stick to 8 bit chars for ascii characters and use utf16 strings whenever non-ascii was needed - though this would be a complex change AND wouldn’t target the initial motivation of this discussion (getting rid of the wchar type)
Also, I’m very skeptical that the very high usage of emoji in vernacular discussion hasn’t leaked into JS strings.