UTF-8 isn't encoded/decoded correctly
See original GitHub issuemsgpack.encode('🐦')
- Produces:
[166, 237, 160, 189, 237, 176, 166]
- Expected:
[164, 240, 159, 144, 166]
The UTF-16 surrogate pair is incorrectly encoded as two pairs of 3 byte UTF-8 codepoints instead of a single 4 byte codepoint.
It seems intentional, given this comment (buffer-lite.js:21
):
// JavaScript's string uses UTF-16 surrogate pairs for characters other than BMP.
// This encodes string as CESU-8 which never reaches 4 octets per character.
I don’t see the ability to encode CESU-8 instead of UTF-8 in the msgpack spec though. This will lead to interoperability issues with other msgpack implementations at best, crashing with incorrectly decoded codepoints at worst.
I wrote a plain JavaScript UTF-8 implementation before, will make a PR when I get a moment.
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
utf 8 - How to detect and fix incorrect character encoding
A pure ASCII string will correctly decode with either method so there is no issue there as well. There are valid UTF-8-encoded sequences ......
Read more >UTF-8 encoded "From" address is not properly decoded
The UTF-8 encoded address should be correctly processed by Domino server and "From" address should look proper in Notes Client or iNotes.
Read more >UTF-8 encoded files aren't displayed correctly in tree/file view
#891 UTF-8 encoded files aren't displayed correctly in tree/file view ... I'm assuming chardet does try to encode with utf-8 and returns no...
Read more >What is UTF-8 Encoding? A Guide for Non-Programmers
UTF -8 encoding is preferable to UTF-16 on the majority of websites, because it uses less memory. Recall that UTF-8 encodes each ASCII...
Read more >How can I fix the UTF-8 error when bulk uploading users?
This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
0.1.26 published. Thanks!
Hope it be fixed