[BUG]: unicode handling maybe broken
See original GitHub issueDescribe the bug
In a tweet someone posted https://godbolt.org/z/fdWx983sa which tries to std::wcout
some unicode things.
Steps to reproduce
Click the link and see the output is SomeText F F
Expected behavior
The output should contain unicode chars, something like SomeText Φ Φ"
Reproduction link
https://godbolt.org/z/fdWx983sa
Screenshots
Not applicable
Operating System
No response
Browser version
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Bug Report: Unicode handling error in Team Name Addition ...
Summary: Adding a new team name doesn't work as expected in Pipedrive. Steps to reproduce: In Pipedrive app, go to Settings > Manage...
Read more >broken GitLab syntax: labels with unicode characters fail to ...
It is a ~bug. I gave unicode chars as an example. There may be other examples which break the rendering (e.g. usernames in...
Read more >Unicode characters broken when __MSG_... in css file
Actual results: Unicode characters broken, regardless of the css or html encoding. Expected results: Correctly handle characters.
Read more >Divides a string into substrings at each Unicode whitespace ...
According to the docs, String.split/1… ... Possible bug in String.split/1's handling of Unicode whitespace? ... Maybe others too? No-break ...
Read more >Fixing broken UTF-8 encoding - php - Stack Overflow
A quick look seems to suggest that your strings might have been "double" utf-8 encoded. I.e. encoded in utf-8, those bytes taken as...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Dug a bit into cppreference and the GCC testsuite, and they basically do this too when testing
std::wcout
with unicode characterse.g. https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/libstdc%2B%2B-v3/testsuite/27_io/objects/wchar_t/11.cc
They only seem to not use it in situations with just ASCII.
There’s also examples on the internet calling
std::imbue(std::wcout, loc)
, but that only applies to language related formatting as far as I can tell. Theglobal()
call also affects the encoding/decoding.It might be that it’s a GCC bug that they haven’t noticed because it’s never tested without
global()
- but that’s just speculation from my perspective. I don’t know what the standard says on this subject, but I bet it’s one of those “implementation defined” kind of things.fmt::print
also starts working with the addition ofsetlocale
.https://godbolt.org/z/xKWjTs6z7