question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ability to validate UTF-8 encoding

See original GitHub issue

In iconv:

> Iconv = require('iconv').Iconv
> i = new Iconv('UTF-8', 'UTF-8')
> i.convert(new Buffer([128]))
Error: Illegal character sequence.

In iconv-lite:

> iconv = require('iconv-lite')
> iconv.decode(new Buffer([128]), 'UTF-8')
'�'

With iconv, you know whether the decoding succeeded. With iconv-lite there is no way to know if the decoding succeeded.

Issue Analytics

  • State:open
  • Created 9 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
ashtuchkincommented, May 26, 2015

“�” is used for decoding, “?” for encoding (as many encodings cannot represent �).

Alexander Shtuchkin

On Mon, May 25, 2015 at 10:57 PM, Benjamin Pasero notifications@github.com wrote:

@ashtuchkin https://github.com/ashtuchkin got it. But in the docs you say “Untranslatable characters are set to � or ?”, so it might also be “?” which I cannot really check for because its a valid character.

— Reply to this email directly or view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/83#issuecomment-105406513 .

1reaction
ashtuchkincommented, May 26, 2015

Yeah, that’s a completely separate issue from “ability to validate UTF-8 encoding” 😃

I have a callback-like mechanism in mind (see #53), where you can either throw exception or define replacements for characters that cannot be represented. Still haven’t got a moment to implement it though.​

Alexander Shtuchkin

On Tue, May 26, 2015 at 10:25 AM, Benjamin Pasero notifications@github.com wrote:

In our case, we use encode() to convert an existing file to another encoding. E.g. we have a source UTF-16 file that we want to save as DOS encoding. DOS can not present all encodings (Chinese et. al), so I would like to show an error to the user. Similar to how in Sublime Text you will not be able to save a file in an encoding that cannot represent all characters you have before encoding.

— Reply to this email directly or view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/83#issuecomment-105611321 .

Read more comments on GitHub >

github_iconTop Results From Across the Web

UTF-8 Validation in Java - GeeksforGeeks
Given an array of integers representing the data, return whether it is a valid UTF-8 encoding. The input is an array of integers....
Read more >
Check for valid UTF-8 encoding in C - Stack Overflow
Valid ASCII encoding (ranging from 0x00 to 0x7F) are accepted by line 1. 1b) 2-bytes (U+0080 - U+07FF). Correct encodings for U+0080 is...
Read more >
UTF-8 Validation - LeetCode
UTF-8 Validation - Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to...
Read more >
Validating UTF-8 bytes (Java edition) - Daniel Lemire's blog
Designing a good benchmark is difficult. I keep things simple. I generate 1002-byte UTF-8 string made of random (non-ASCII) characters. Then I ...
Read more >
CAPEC-80 - Using UTF-8 Encoding to Bypass Validation Logic
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found