URLs with %FX percentage encoding not forwarded via Port :8181 to ReGa but result in "400 Bad Request"
See original GitHub issueDescribe the issue you are experiencing
When performing the following curl command execution to write a text with “ö” umlaut in a system variable “TestRM” a 400 Bad Request
error is returned:
# curl -vvv http://localhost:8181/hm.exe?value=dom.GetObject%28ID_SYSTEM_VARIABLES%29.Get%28%27TestRM%27%29.State%28%27Fl%F6te%27%29
> GET /hm.exe?value=dom.GetObject%28ID_SYSTEM_VARIABLES%29.Get%28%27TestRM%27%29.State%28%27Fl%F6te%27%29 HTTP/1.1
> Host: localhost:8181
> User-Agent: curl/7.79.1
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Content-Type: text/html
< Content-Length: 345
< Connection: close
< Date: Mon, 09 May 2022 13:17:34 GMT
<
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>400 Bad Request</title>
</head>
<body>
<h1>400 Bad Request</h1>
</body>
</html>
When removing the %F6
in the above URL or replacing it with some other one, no Bad Request error is returned and the request will be handled correctly, thus forwarded to ReGaHss via its proxy statement.
Describe the behavior you expected
The request should be internally forwarded to port 8183 which is the rega scripting port which lighttpd is proxying.
Steps to reproduce the issue
- login to CCU/RaspberryMatic via SSH
- create a text system variable “TestRM”
- execute
curl -vvv http://localhost:8181/hm.exe?value=dom.GetObject%28ID_SYSTEM_VARIABLES%29.Get%28%27TestRM%27%29.State%28%27Fl%F6te%27%29
- Monitor the output
What is the version this bug report is based on?
3.63.9.20220430
Which base platform are you running?
ova (Open Virtual Infrastructure)
Which HomeMatic/homematicIP radio module are you using?
n/a
Anything in the logs that might be useful for us?
The same issue can be seen when using any %FC in the URL.
Please note that any %Fx
percentage URL encoding seems to trigger the issue. Thus no “%FC” or any other “%Fx” encoding with x
corresponding to any hex value seems to be possible.
Additional information
After some short investigation, the issue seems to be related to newer lighttpd versions normalizing their URLs to some extend and thus might end up in 400 Bad Request
for certain reserved characters. See here for more information/docs on that matter:
https://redmine.lighttpd.net/projects/lighttpd/wiki/Server_http-parseoptsDetails
After adding the "url-normalize" => "disable"
statement the issue seems to be gone.
This also refs https://homematic-forum.de/forum/viewtopic.php?p=717789#p717791
Issue Analytics
- State:
- Created a year ago
- Comments:17 (17 by maintainers)
Top GitHub Comments
To be clear, this is not a limitation of UTF-8. The entirety of ISO-8859-1 (single-byte encoding) and much more is able to be properly encoded in UTF-8 (multi-byte encoding).
Also, this is not a bug in lighttpd. The lighttpd behavior is explicit and intentional to support a stronger security stance.
The option for url normalization was added in lighttpd 1.4.50, released in Aug 2018. After multiple announcements in release notes for subsequent releases, url normalization was enabled by default in lighttpd 1.4.54, released May 2019 (3 years ago).
(<generic rant>) It is a frequent lament of mine and other open source developers that numerous popular linux distros are negligent (IMNSHO) in upgrading software in reasonable timeframes, and upgrading “stable” distros – a.k.a. “nearly unmaintained” because “nearly unchanging” – is not made easier for end-users, many of whom do not manually perform periodic upgrades on a regular basis on their own. There are many good reasons why browsers and other desktop software run tasks which check for and perform upgrades. Similarly, automatic software upgrades (at least for security patches) is the default behavior for iOS and Android mobile devices. (</generic rant>)
What is producing the URL? What is percent-encoding the problematic ISO-8859-1 string? Is it the ancient app or is it something else? Whatever it is that is doing the percent-encoding, it might be appropriate to modify that code to convert the string to UTF-8 prior to percent-encoding. Then, you could use lighttpd mod_magnet for requests only to this application, percent-decode the query-string, convert to ISO-8859-1, and re-percent-encode before passing the request on to the application. Alternatively, modify the application to base64-encode/decode the iso-8859-1 string into the URL query-string instead of selective percent-encoding. Alternatively, use POST to send the data instead of using query-string. Alternatively, use an HTTP header instead of using the query-string, and use
"header-strict" => "disable"
while leaving"url-normalize" => "enable"
.Best practices strongly recommend UTF-8 in URIs since there is no generic way to convey the charset used in the URI, whereas there are alternatives to specify charset for strings in HTTP headers. Search for “UTF-8” in https://html.spec.whatwg.org/ and note the frequency of the explicit phrase “UTF-8 percent-encode” https://url.spec.whatwg.org/#string-utf-8-percent-encode
No, it is not “the way to go” for a security-focused solution. A security-focused solution would address the source of the issue: a 199x application that does not support UTF-8.
Instead, it is a “quick-fix” workaround to use
"url-normalize" => "disable"
to disable lighttpd URL normalization (which includes detection of invalid UTF-8 in percent-encodings).The lighttpd core provides the
"url-normalize"
option and works with or without that feature enabled. However, depending on your lighttpd.conf configuration, you might not get the behavior you expect in some situations. Your mod_rewrite and mod_redirect rules might not match for non-normalized alternative encodings. Best practices suggest writing explicit allow rules, and then deny everything else, but I am sure there are many custom lighttpd.conf instances that do otherwise.Here is one example of unwanted behavior that is fixed with url normalization: https://redmine.lighttpd.net/issues/1720
tl;dr: if fixing the application is not an option, then using
"url-normalize" => "disable"
is a workaround, though disabling the url normalization may cause lighttpd.conf configured behavior to change for non-normalized URIs, which in turn may have security implications for your specific environment.If fixing the application is an option, who would do that and where is the code? Here is an example for charset conversion for Python and Java: https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/
Given the 3 years that has elapsed since lighttpd enabled url-normalization by default, is it worthwhile for me to even consider any enhancements to lighttpd url-normalization options, which may take another 3-4 years to reach end-users?
On my development branch, where things might change, the commit may be cherry-picked and applied to lighttpd-1.4.64. https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/personal/gstrauss/master https://git.lighttpd.net/lighttpd/lighttpd1.4/commit/a01e62bb7d562d2176e5fc50811f0b22b30cdfa1 The default is
server.http-parseopts += ("url-invalid-utf8-reject" => "enable")
to preserve existing behavior.