Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTP/2.0 non US-ASCII header names should be rejected

See original GitHub issue

Expected behavior

According to the HTTP/2.0 RFC :

Just as in HTTP/1.x, header field names are strings of ASCII characters that are compared in a case-insensitive fashion. However, header field names MUST be converted to lowercase prior to their encoding in HTTP/2. A request or response containing uppercase header field names MUST be treated as malformed (Section 8.1.2.6).

Since it is referring to HTTP/1.x RFC, here is the interesting part :

A recipient MUST parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII [USASCII]. Parsing an HTTP message as a stream of Unicode characters, without regard for the specific encoding, creates security vulnerabilities due to the varying ways that string processing libraries handle invalid multibyte character sequences that contain the octet LF (%x0A).

US-ASCII is only using 7 bits. So during the header name parsing, any bytes in the form 1xxx xxxx should be treated as error and the request should be rejected (ie : byteValue < 0).

As reminder, here is the header name validator used in Netty for HTTP/1.1 :

private static void validateHeaderNameElement(byte value) {
        switch (value) {
        case 0x1c:
        case 0x1d:
        case 0x1e:
        case 0x1f:
        case 0x00:
        case '\t':
        case '\n':
        case 0x0b:
        case '\f':
        case '\r':
        case ' ':
        case ',':
        case ':':
        case ';':
        case '=':
            throw new IllegalArgumentException(
               "a header name cannot contain the following prohibited characters: =,;: \\t\\r\\n\\v\\f: " +
                       value);
        default:
            // Check to see if the character is not an ASCII character, or invalid
            if (value < 0) {
                throw new IllegalArgumentException("a header name cannot contain non-ASCII character: " + value);
            }
        }

Actual behavior

The only validation made on header names is :

private static final ByteProcessor HTTP2_NAME_VALIDATOR_PROCESSOR = new ByteProcessor() {
        @Override
        public boolean process(byte value) {
            return !isUpperCase(value); // isUpperCase : return value >= 'A' && value <= 'Z';
        }
    };

Thus non-ASCII characters are accepted and corrupted when transformed into string (ie : 2 byte coding UTF-16BE chars).

As an example, this non-ASCII byte : 0xf0 will be converted into 0x00 0xf0.

Minimal yet complete reproducer code (or URL to code)

import org.assertj.core.api.Assertions;
import org.junit.jupiter.api.Test;

import io.netty.handler.codec.http2.DefaultHttp2Headers;
import io.netty.handler.codec.http2.Http2Error;
import io.netty.handler.codec.http2.Http2Exception;
import io.netty.util.AsciiString;

public class Test {
    @Test
    void shouldRejectNonUsAsciiValues() {
        // U+1F631 : 😱 : F0 9F 98 B1
        byte[] buf = new byte[] {(byte) 0xF0, (byte) 0x9F, (byte) 0x98, (byte) 0xB1};
        DefaultHttp2Headers headers = new DefaultHttp2Headers();
        Assertions.assertThatThrownBy(() -> headers.add(new AsciiString(buf), "test"))
                .isInstanceOf(Http2Exception.class)
                .hasMessageStartingWith("invalid header name")
                .hasFieldOrPropertyWithValue("error", Http2Error.PROTOCOL_ERROR);
    }
}

Netty version

4.1.72

JVM version (e.g. `java -version`)

$ java --version
openjdk 11.0.11 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)

OS version (e.g. `uname -a`)

$ uname -a
Darwin *** 20.6.0 Darwin Kernel Version 20.6.0***

Issue Analytics

State:
Created 2 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

idelpivnitskiycommented, Jan 11, 2022

Looks reasonable, HTTP/1.1 RFC defines the header name as:

header-field   = field-name ":" OWS field-value OWS
field-name     = token
token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
    "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA

HTTP/2.0 says it’s the same with a new requirement to be in lowercase.

0reactions

NilsRenaudcommented, Jan 12, 2022

This quote of HTTP/1 doesn’t argue your point. It says superset, which means things like Latin-1 and UTF-8 are fair game.

@ejona86 you’re right indeed and the RFC extract from @idelpivnitskiy is the only proof I needed to refer to.

I don’t understand the corruption argument. When is that done? If you are converting an AsciiString containing non-ASCII to a String, you have to know the encoding. Basically, there’s nothing that says 0x00 0xf0 is not the valid conversion of 0xf0 in your example.

@ejona86 Indeed, instead of corruption I should have say that the header name bytes are read as Latin-1 encoded text then encoded with UTF-16 in java’s char to form a String.