TLSv1.3 can fail with HTTP/2 and Session Tickets Enabled
See original GitHub issueNetty version
4.1.45 + TCNative 2.0.28
JVM version (e.g. java -version
)
Java 1.8
OS version (e.g. uname -a
)
Linux / Mac
Repro
The steps to reproduce this are fairly difficult, and I don’t know enough of the OpenSSL API, but I can give the manual steps I took to get here. Netflix is trying to enable TLSv1.3 on some of its servers, but it results in some corrupted SSL connections. This appears most commonly during connection startup, but I think it can happen at any point.
Typical errors look like SSL_ERROR_RX_RECORD_TOO_LONG
in Firefox, or ERR_SSL_PROTOCOL_ERROR
in Chrome, but this is in fact due to data corruption in Netty. The core issue is a bad interaction in the two overloads ofSslHandler.wrap
:
At the bottom, on line 1043, If the engine.wrap
call results in a BUFFER_OVERFLOW
, it resize the buffer to try again. The bug here is that when this happens, the original data that was attempted to be written gets lost. In my case, the first 68 or so bytes discarded, leaving a partial response to be written out. This is the first half of the bug.
The second part is the setup to this bug, and is the other wrap overload:
On line 821, the call to allocateOutNetBuf
attempts to create a buffer large enough to hold the write. In my case, the readable data is 46 bytes. This results in a call to ReferenceCountedOpenSslEngine.calculateMaxLengthForWrap()
, which adds 22 bytes of extra headroom, resulting in a 68 byte buffer. This is normally the correct amount if session tickets are not enabled. When they are, several hundred additional bytes are needed. These session ticket bytes are included by the OpenSSL library, and don’t appear to be account for by Netty.
This series of events is most common when turning on TLSv1.3, HTTP/2, and Session Tickets. In my experimentation, the SSL handshake actually succeeds, but then crashes shortly after. The sequence of events looks like:
- Call
openssl s_client -tls1_3 -connect 127.0.0.1:7006 -alpn h2 -debug -msg
- SSL Handshake proceeds successfully, resulting in the client logging a successful handshake[1].
- On the server side,
SslHandler.setHandshakeSuccess
is invoked, and fires the event up the pipeline. ApplicationProtocolNegotiationHandler
captures the handshake event, seesh2
has been picked, and installsHttp2FrameCodec
, and invokeshandlerAdded
.- This causes the frame window to be written and flushed, along with the initial settings.
- It calls all the way back down into the
wrap()
function as mentioned above, trying to wrap the 46 bytes of application data. - In a normal, non session ticket case, this sends the initial 46 bytes, which looks something like [2]. Note the initial
17 03 03
, which correctly indicates this is application data. - In the failure case, Session Tickets get added in by the SSL library, the response is too big, causing
engine.wrap
to return-1
. When attempted again, the session tickets are written out, but they are missing the 5 byte header that identifies them. - OpenSSL ends up writing bogus 450 byte packet, which the client reads, misinterprets as a bad length, and closes the connection.
I was able to force this to succeed by manually growing the out
buffer in the wrap
call to a very large size. This allows the initial engine.wrap
to succeed, send the session ticket through, following by the application data. When this happens, openssl prints out Post-Handshake New Session Ticket arrived
.
@normanmaurer I’m really not sure how to fix this, I have packet captures of most of this, with the failure case, the success case, and the non-session ticket case. I don’t know enough of the OpenSSL API to make a call on what should happen.
[1]:
SSL handshake has read 3475 bytes and written 304 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
ALPN protocol: h2
Early data was not sent
Verify return code: 0 (ok)
---
[2]
read from 0x7fed4e50e0b0 [0x7fed4f00c803] (5 bytes => 5 (0x5))
0000 - 17 03 03 00 3f ....?
<<< ??? [length 0005]
17 03 03 00 3f
read from 0x7fed4e50e0b0 [0x7fed4f00c808] (63 bytes => 63 (0x3F))
0000 - 3c 0f 39 fe 64 c2 26 c5-8d 97 29 f0 ea ba ee aa <.9.d.&...).....
0010 - 0f 58 d1 1b 3d 67 3a 36-3d 52 8f f6 ec ae a6 2f .X..=g:6=R...../
0020 - 04 78 a4 16 97 40 40 2f-f2 58 ec a2 eb bd d3 24 .x...@@/.X.....$
0030 - 60 33 88 8c 8d 77 f2 6f-5e 9f ec 29 2d e3 2f `3...w.o^..)-./
<<< TLS 1.3 [length 0001]
17
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:22 (18 by maintainers)
Top GitHub Comments
@carl-mastrangelo ok good news is that I can reproduce it… Now the fun begins. Will keep you posted .
Fixed by https://github.com/netty/netty/pull/10063 …