question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

URLs with %FX percentage encoding not forwarded via Port :8181 to ReGa but result in "400 Bad Request"

See original GitHub issue

Describe the issue you are experiencing

When performing the following curl command execution to write a text with “ö” umlaut in a system variable “TestRM” a 400 Bad Request error is returned:

# curl -vvv http://localhost:8181/hm.exe?value=dom.GetObject%28ID_SYSTEM_VARIABLES%29.Get%28%27TestRM%27%29.State%28%27Fl%F6te%27%29
> GET /hm.exe?value=dom.GetObject%28ID_SYSTEM_VARIABLES%29.Get%28%27TestRM%27%29.State%28%27Fl%F6te%27%29 HTTP/1.1
> Host: localhost:8181
> User-Agent: curl/7.79.1
> Accept: */*
> 
< HTTP/1.1 400 Bad Request
< Content-Type: text/html
< Content-Length: 345
< Connection: close
< Date: Mon, 09 May 2022 13:17:34 GMT
< 
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>400 Bad Request</title>
 </head>
 <body>
  <h1>400 Bad Request</h1>
 </body>
</html>

When removing the %F6 in the above URL or replacing it with some other one, no Bad Request error is returned and the request will be handled correctly, thus forwarded to ReGaHss via its proxy statement.

Describe the behavior you expected

The request should be internally forwarded to port 8183 which is the rega scripting port which lighttpd is proxying.

Steps to reproduce the issue

  1. login to CCU/RaspberryMatic via SSH
  2. create a text system variable “TestRM”
  3. execute curl -vvv http://localhost:8181/hm.exe?value=dom.GetObject%28ID_SYSTEM_VARIABLES%29.Get%28%27TestRM%27%29.State%28%27Fl%F6te%27%29
  4. Monitor the output

What is the version this bug report is based on?

3.63.9.20220430

Which base platform are you running?

ova (Open Virtual Infrastructure)

Which HomeMatic/homematicIP radio module are you using?

n/a

Anything in the logs that might be useful for us?

The same issue can be seen when using any %FC in the URL.

Please note that any %Fx percentage URL encoding seems to trigger the issue. Thus no “%FC” or any other “%Fx” encoding with x corresponding to any hex value seems to be possible.

Additional information

After some short investigation, the issue seems to be related to newer lighttpd versions normalizing their URLs to some extend and thus might end up in 400 Bad Request for certain reserved characters. See here for more information/docs on that matter:

https://redmine.lighttpd.net/projects/lighttpd/wiki/Server_http-parseoptsDetails

After adding the "url-normalize" => "disable" statement the issue seems to be gone.

This also refs https://homematic-forum.de/forum/viewtopic.php?p=717789#p717791

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
gstrausscommented, May 9, 2022

In previous versions of lighttpd this wasn’t really an issue as it forwarded the %XX encoded URL portions verbatim without having touched them. But at some point it seems this is not the case anymore and now (as shown here) %FC or %F6 cannot be used anymore - which are however part of ISO-8859-1 (german umlauts).

To be clear, this is not a limitation of UTF-8. The entirety of ISO-8859-1 (single-byte encoding) and much more is able to be properly encoded in UTF-8 (multi-byte encoding).

Also, this is not a bug in lighttpd. The lighttpd behavior is explicit and intentional to support a stronger security stance.

The option for url normalization was added in lighttpd 1.4.50, released in Aug 2018. After multiple announcements in release notes for subsequent releases, url normalization was enabled by default in lighttpd 1.4.54, released May 2019 (3 years ago).

(<generic rant>) It is a frequent lament of mine and other open source developers that numerous popular linux distros are negligent (IMNSHO) in upgrading software in reasonable timeframes, and upgrading “stable” distros – a.k.a. “nearly unmaintained” because “nearly unchanging” – is not made easier for end-users, many of whom do not manually perform periodic upgrades on a regular basis on their own. There are many good reasons why browsers and other desktop software run tasks which check for and perform upgrades. Similarly, automatic software upgrades (at least for security patches) is the default behavior for iOS and Android mobile devices. (</generic rant>)

Why iso-8859-1? Why is it not converted to UTF-8 before url-encoding (%xx)?

Well, the application which finally receives the HTTP request (behind the lighttpd proxy) is a very old-school (199x) application that can only really speak iso-8859-1 and thus expects also URLs to be percentage-encoded in with ISO-8859-1 and not UTF-8.

What is producing the URL? What is percent-encoding the problematic ISO-8859-1 string? Is it the ancient app or is it something else? Whatever it is that is doing the percent-encoding, it might be appropriate to modify that code to convert the string to UTF-8 prior to percent-encoding. Then, you could use lighttpd mod_magnet for requests only to this application, percent-decode the query-string, convert to ISO-8859-1, and re-percent-encode before passing the request on to the application. Alternatively, modify the application to base64-encode/decode the iso-8859-1 string into the URL query-string instead of selective percent-encoding. Alternatively, use POST to send the data instead of using query-string. Alternatively, use an HTTP header instead of using the query-string, and use "header-strict" => "disable" while leaving "url-normalize" => "enable".

Best practices strongly recommend UTF-8 in URIs since there is no generic way to convey the charset used in the URI, whereas there are alternatives to specify charset for strings in HTTP headers. Search for “UTF-8” in https://html.spec.whatwg.org/ and note the frequency of the explicit phrase “UTF-8 percent-encode” https://url.spec.whatwg.org/#string-utf-8-percent-encode

Nevertheless setting "url-normalize" => "disable" in the lighttpd.conf solves this issue currently. Question remains, however: Is this really the way to go and which potential side-effects this could have?

No, it is not “the way to go” for a security-focused solution. A security-focused solution would address the source of the issue: a 199x application that does not support UTF-8.

Instead, it is a “quick-fix” workaround to use "url-normalize" => "disable" to disable lighttpd URL normalization (which includes detection of invalid UTF-8 in percent-encodings).

which potential side-effects this could have?

The lighttpd core provides the "url-normalize" option and works with or without that feature enabled. However, depending on your lighttpd.conf configuration, you might not get the behavior you expect in some situations. Your mod_rewrite and mod_redirect rules might not match for non-normalized alternative encodings. Best practices suggest writing explicit allow rules, and then deny everything else, but I am sure there are many custom lighttpd.conf instances that do otherwise.

Here is one example of unwanted behavior that is fixed with url normalization: https://redmine.lighttpd.net/issues/1720


tl;dr: if fixing the application is not an option, then using "url-normalize" => "disable" is a workaround, though disabling the url normalization may cause lighttpd.conf configured behavior to change for non-normalized URIs, which in turn may have security implications for your specific environment.

If fixing the application is an option, who would do that and where is the code? Here is an example for charset conversion for Python and Java: https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/

Given the 3 years that has elapsed since lighttpd enabled url-normalization by default, is it worthwhile for me to even consider any enhancements to lighttpd url-normalization options, which may take another 3-4 years to reach end-users?

0reactions
gstrausscommented, May 25, 2022

On my development branch, where things might change, the commit may be cherry-picked and applied to lighttpd-1.4.64. https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/personal/gstrauss/master https://git.lighttpd.net/lighttpd/lighttpd1.4/commit/a01e62bb7d562d2176e5fc50811f0b22b30cdfa1 The default is server.http-parseopts += ("url-invalid-utf8-reject" => "enable") to preserve existing behavior.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does the percent sign in a URL cause an HTTP 400 Bad ...
The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding).
Read more >
Untitled
Crypte archeologique notre dame paris, Bpp acca login, Sdhp1020 chm, Havin lo bagiye, Appaddict icon not showing up, Vrbike, Soft rock music album, ......
Read more >
URL encoded colon resolves in 400 Bad Request
It seems that ASP.net does not allow colons before the '?' in an URL, even if it is encoded as %3A. For example,...
Read more >
Untitled
Fourticq, Stillen rotors wk, Zorabian foods mumbai, 82015 zex, Clerk result out, Norway 2011 disaster, Maglieria mario biondi, Cotiujanschii victor, ...
Read more >
Untitled
#113otxg?url=w6bl2j Real friends acoustic summer, Smith and nephew ord, Power of broke barnes and noble, Siemens dematic grand rapids, Build time machine ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found