question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TLSv1.3 can fail with HTTP/2 and Session Tickets Enabled

See original GitHub issue

Netty version

4.1.45 + TCNative 2.0.28

JVM version (e.g. java -version)

Java 1.8

OS version (e.g. uname -a)

Linux / Mac

Repro

The steps to reproduce this are fairly difficult, and I don’t know enough of the OpenSSL API, but I can give the manual steps I took to get here. Netflix is trying to enable TLSv1.3 on some of its servers, but it results in some corrupted SSL connections. This appears most commonly during connection startup, but I think it can happen at any point.

Typical errors look like SSL_ERROR_RX_RECORD_TOO_LONG in Firefox, or ERR_SSL_PROTOCOL_ERROR in Chrome, but this is in fact due to data corruption in Netty. The core issue is a bad interaction in the two overloads ofSslHandler.wrap:

https://github.com/netty/netty/blob/136db8680afa73c7491277d6fed1e85e703fa905/handler/src/main/java/io/netty/handler/ssl/SslHandler.java#L1003-L1058

At the bottom, on line 1043, If the engine.wrap call results in a BUFFER_OVERFLOW, it resize the buffer to try again. The bug here is that when this happens, the original data that was attempted to be written gets lost. In my case, the first 68 or so bytes discarded, leaving a partial response to be written out. This is the first half of the bug.

The second part is the setup to this bug, and is the other wrap overload:

https://github.com/netty/netty/blob/netty-4.1.45.Final/handler/src/main/java/io/netty/handler/ssl/SslHandler.java#L801-L882

On line 821, the call to allocateOutNetBuf attempts to create a buffer large enough to hold the write. In my case, the readable data is 46 bytes. This results in a call to ReferenceCountedOpenSslEngine.calculateMaxLengthForWrap(), which adds 22 bytes of extra headroom, resulting in a 68 byte buffer. This is normally the correct amount if session tickets are not enabled. When they are, several hundred additional bytes are needed. These session ticket bytes are included by the OpenSSL library, and don’t appear to be account for by Netty.

This series of events is most common when turning on TLSv1.3, HTTP/2, and Session Tickets. In my experimentation, the SSL handshake actually succeeds, but then crashes shortly after. The sequence of events looks like:

  1. Call openssl s_client -tls1_3 -connect 127.0.0.1:7006 -alpn h2 -debug -msg
  2. SSL Handshake proceeds successfully, resulting in the client logging a successful handshake[1].
  3. On the server side, SslHandler.setHandshakeSuccess is invoked, and fires the event up the pipeline.
  4. ApplicationProtocolNegotiationHandler captures the handshake event, sees h2 has been picked, and installs Http2FrameCodec, and invokes handlerAdded.
  5. This causes the frame window to be written and flushed, along with the initial settings.
  6. It calls all the way back down into the wrap() function as mentioned above, trying to wrap the 46 bytes of application data.
  7. In a normal, non session ticket case, this sends the initial 46 bytes, which looks something like [2]. Note the initial 17 03 03, which correctly indicates this is application data.
  8. In the failure case, Session Tickets get added in by the SSL library, the response is too big, causing engine.wrap to return -1. When attempted again, the session tickets are written out, but they are missing the 5 byte header that identifies them.
  9. OpenSSL ends up writing bogus 450 byte packet, which the client reads, misinterprets as a bad length, and closes the connection.

I was able to force this to succeed by manually growing the out buffer in the wrap call to a very large size. This allows the initial engine.wrap to succeed, send the session ticket through, following by the application data. When this happens, openssl prints out Post-Handshake New Session Ticket arrived.

@normanmaurer I’m really not sure how to fix this, I have packet captures of most of this, with the failure case, the success case, and the non-session ticket case. I don’t know enough of the OpenSSL API to make a call on what should happen.

[1]:

SSL handshake has read 3475 bytes and written 304 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
ALPN protocol: h2
Early data was not sent
Verify return code: 0 (ok)
---

[2]

read from 0x7fed4e50e0b0 [0x7fed4f00c803] (5 bytes => 5 (0x5))
0000 - 17 03 03 00 3f                                    ....?
<<< ??? [length 0005]
    17 03 03 00 3f
read from 0x7fed4e50e0b0 [0x7fed4f00c808] (63 bytes => 63 (0x3F))
0000 - 3c 0f 39 fe 64 c2 26 c5-8d 97 29 f0 ea ba ee aa   <.9.d.&...).....
0010 - 0f 58 d1 1b 3d 67 3a 36-3d 52 8f f6 ec ae a6 2f   .X..=g:6=R...../
0020 - 04 78 a4 16 97 40 40 2f-f2 58 ec a2 eb bd d3 24   .x...@@/.X.....$
0030 - 60 33 88 8c 8d 77 f2 6f-5e 9f ec 29 2d e3 2f      `3...w.o^..)-./
<<< TLS 1.3 [length 0001]
    17

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:22 (18 by maintainers)

github_iconTop GitHub Comments

4reactions
normanmaurercommented, Feb 25, 2020

@carl-mastrangelo ok good news is that I can reproduce it… Now the fun begins. Will keep you posted .

1reaction
normanmaurercommented, Feb 26, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

TLS1.3 - OpenSSLWiki
The OpenSSL 1.1.1 release includes support for TLSv1.3. ... The write error will be ignored if it's a session ticket.
Read more >
TLSv1.3 SSL Decryption Support - PAN-OS - Palo Alto Networks
A change from previous TLS versions is that TLSv1.3 encrypts certificate information, so the firewall no longer has visibility into that data ...
Read more >
TLS 1.3: Everything you need to know - The SSL Store
Here's the latest update on TLS 1.3, where it is, why it's not out yet and what it does better than is predecessor,...
Read more >
Transport Layer Security (TLS) connections might fail or ...
Transport Layer Security (TLS) connections might fail or timeout when connecting or attempting a resumption.
Read more >
RFC 5077: Transport Layer Security (TLS) Session ...
Abstract This document describes a mechanism that enables the Transport Layer ... The client can subsequently resume a session using the obtained ticket....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found