ability to validate UTF-8 encoding
See original GitHub issueIn iconv:
> Iconv = require('iconv').Iconv
> i = new Iconv('UTF-8', 'UTF-8')
> i.convert(new Buffer([128]))
Error: Illegal character sequence.
In iconv-lite:
> iconv = require('iconv-lite')
> iconv.decode(new Buffer([128]), 'UTF-8')
'�'
With iconv, you know whether the decoding succeeded. With iconv-lite there is no way to know if the decoding succeeded.
Issue Analytics
- State:
- Created 9 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
UTF-8 Validation in Java - GeeksforGeeks
Given an array of integers representing the data, return whether it is a valid UTF-8 encoding. The input is an array of integers....
Read more >Check for valid UTF-8 encoding in C - Stack Overflow
Valid ASCII encoding (ranging from 0x00 to 0x7F) are accepted by line 1. 1b) 2-bytes (U+0080 - U+07FF). Correct encodings for U+0080 is...
Read more >UTF-8 Validation - LeetCode
UTF-8 Validation - Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to...
Read more >Validating UTF-8 bytes (Java edition) - Daniel Lemire's blog
Designing a good benchmark is difficult. I keep things simple. I generate 1002-byte UTF-8 string made of random (non-ASCII) characters. Then I ...
Read more >CAPEC-80 - Using UTF-8 Encoding to Bypass Validation Logic
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
“�” is used for decoding, “?” for encoding (as many encodings cannot represent �).
Alexander Shtuchkin
On Mon, May 25, 2015 at 10:57 PM, Benjamin Pasero notifications@github.com wrote:
Yeah, that’s a completely separate issue from “ability to validate UTF-8 encoding” 😃
I have a callback-like mechanism in mind (see #53), where you can either throw exception or define replacements for characters that cannot be represented. Still haven’t got a moment to implement it though.
Alexander Shtuchkin
On Tue, May 26, 2015 at 10:25 AM, Benjamin Pasero notifications@github.com wrote: