Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ability to validate UTF-8 encoding

See original GitHub issue

In iconv:

> Iconv = require('iconv').Iconv
> i = new Iconv('UTF-8', 'UTF-8')
> i.convert(new Buffer([128]))
Error: Illegal character sequence.

In iconv-lite:

> iconv = require('iconv-lite')
> iconv.decode(new Buffer([128]), 'UTF-8')
'�'

With iconv, you know whether the decoding succeeded. With iconv-lite there is no way to know if the decoding succeeded.

Issue Analytics

State:
Created 9 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

2reactions

ashtuchkincommented, May 26, 2015

“�” is used for decoding, “?” for encoding (as many encodings cannot represent �).

Alexander Shtuchkin

On Mon, May 25, 2015 at 10:57 PM, Benjamin Pasero notifications@github.com wrote:

@ashtuchkin https://github.com/ashtuchkin got it. But in the docs you say “Untranslatable characters are set to � or ?”, so it might also be “?” which I cannot really check for because its a valid character.

— Reply to this email directly or view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/83#issuecomment-105406513 .

1reaction

ashtuchkincommented, May 26, 2015

Yeah, that’s a completely separate issue from “ability to validate UTF-8 encoding” 😃

I have a callback-like mechanism in mind (see #53), where you can either throw exception or define replacements for characters that cannot be represented. Still haven’t got a moment to implement it though.

Alexander Shtuchkin

On Tue, May 26, 2015 at 10:25 AM, Benjamin Pasero notifications@github.com wrote:

In our case, we use encode() to convert an existing file to another encoding. E.g. we have a source UTF-16 file that we want to save as DOS encoding. DOS can not present all encodings (Chinese et. al), so I would like to show an error to the user. Similar to how in Sublime Text you will not be able to save a file in an encoding that cannot represent all characters you have before encoding.

— Reply to this email directly or view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/83#issuecomment-105611321 .

Top Results From Across the Web

UTF-8 Validation in Java - GeeksforGeeks

Given an array of integers representing the data, return whether it is a valid UTF-8 encoding. The input is an array of integers....

Check for valid UTF-8 encoding in C - Stack Overflow

Valid ASCII encoding (ranging from 0x00 to 0x7F) are accepted by line 1. 1b) 2-bytes (U+0080 - U+07FF). Correct encodings for U+0080 is...

UTF-8 Validation - LeetCode

UTF-8 Validation - Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to...

Validating UTF-8 bytes (Java edition) - Daniel Lemire's blog

Designing a good benchmark is difficult. I keep things simple. I generate 1002-byte UTF-8 string made of random (non-ASCII) characters. Then I ...

CAPEC-80 - Using UTF-8 Encoding to Bypass Validation Logic

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, ...