question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

why does urlencoded charset must be utf-8?

See original GitHub issue

looks like there’s an unnecessary charset check in lib/types/urlencoded.js?

// assert charset
var charset = getCharset(req) || 'utf-8'
if (charset !== 'utf-8') {
  debug('invalid charset')
  next(createError(415, 'unsupported charset "' + charset.toUpperCase() + '"', {
    charset: charset
  }))
  return
}

that value is eventually passed to read fn (defined in lib/read.js) which can work with encodings other than utf-8 as long as they are supported by iconv

// assert charset is supported
if (opts.encoding === null && encoding !== null && !iconv.encodingExists(encoding)) {
  return next(createError(415, 'unsupported charset "' + encoding.toUpperCase() + '"', {
    charset: encoding.toLowerCase()
  }))
}

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:1
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
intellixcommented, Mar 10, 2019

@rodic is that Paysafecard payment callbacks by any chance? having the same issue

The header being sent to me is:

Content-Type: application/x-www-form-urlencoded; charset=ISO-8859-1

I believe you’re saying there isn’t a spec for how to detect it, but what if the header explicitly states what charset it is and you don’t need detection. Could it be allowed through then?

Using https://github.com/ds300/patch-package you can get around it like so:

patches/body-parser+1.18.3.patch

diff --git a/node_modules/body-parser/lib/types/urlencoded.js b/node_modules/body-parser/lib/types/urlencoded.js
index 5ccda21..02f22e0 100644
--- a/node_modules/body-parser/lib/types/urlencoded.js
+++ b/node_modules/body-parser/lib/types/urlencoded.js
@@ -101,16 +101,8 @@ function urlencoded (options) {
       return
     }
 
-    // assert charset
+    // get charset
     var charset = getCharset(req) || 'utf-8'
-    if (charset !== 'utf-8') {
-      debug('invalid charset')
-      next(createError(415, 'unsupported charset "' + charset.toUpperCase() + '"', {
-        charset: charset,
-        type: 'charset.unsupported'
-      }))
-      return
-    }
 
     // read
     read(req, res, next, parse, debug, {

Hope it helps someone else

1reaction
dougwilsoncommented, Apr 5, 2017

Hi @rodic normally, that type will never have a charset in the header at all; it’s only present in some buggy version of Firefox. The spec says that the charset is actually specified in a magic _charset_ parameter value and that’s when you include that as a hidden input element on your form.

I’m not working on implementing it, so you’re welcome to. Remember that that charset is not the spec for what goes to raw-body; it needs to go to (and thus be supported by) the querystring and qs modules since it applies to the url-decoding (the percent decoding), not to the raw unencoded characters (which technically are required to only be US-ASCII).

Read more comments on GitHub >

github_iconTop Results From Across the Web

application/x-www-form-urlencoded and charset="utf-8"?
The application/x-www-form-urlencoded standard implies UTF-8 and percent-encoding. Though: A legacy server-oriented implementation might have to ...
Read more >
UTF-8: The Secret of Character Encoding - HTML Purifier
A character encoding tells the computer how to interpret raw zeroes and ones into real characters. It usually does this by pairing numbers...
Read more >
A Guide to UTF-8 Encoding in PHP and MySQL - Toptal
UTF-8 is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII...
Read more >
Character Encoding Issues - Apache Software Foundation
Section 2.1 of the URI Syntax specification says that characters outside of US-ASCII must be encoded using % escape sequences: each character is...
Read more >
Error 'You have sent us an Illegal URL or an improperly ...
If Content-Type: application/x-www-form-urlencoded is used, the request's payload should not include non UTF-8 characters. Compressing some text using ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found