question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Channel inactive not fired after SslCloseCompletionEvent(SUCCESS) causing connection still in pool hence io.netty.handler.ssl.SslClosedEngineException: SSLEngine closed already

See original GitHub issue

From time to time a connection is still in pool but should not.

Expected Behavior

SslCloseCompletionEvent(SUCCESS) not correctly handled is causing connection still present in pool hence “io.netty.handler.ssl.SslClosedEngineException: SSLEngine closed already” thrown on next usage of the connection. A Channel ‘INACTIVE’ shoud be fired after SslCloseCompletionEvent(SUCCESS)

Here is a fine example of a multi used (7 times) connection : At the end, the connection is unregistered as soon as the sslCloseCompletion is fired : image

Full logs : OK.log

Actual Behavior

In undetermined circumstances, SslCloseCompletionEvent is not followed by INACTIVE In this example, the connection is reused 5min06 after SslCloseCompletionEvent Contrary to what happened in “good” example above there is only one ‘READ COMPLETE’ event fired. May be this is the reason why fireChannelInacctive is not triggered. image

Full logs : NOK.log

Steps to Reproduce

Unfortunatly, I have no example to provide since this happen randomly.

Possible Solution

Bad idea, the SslReadHandler is not in the pipeline anymore when this happen. In reactor.netty.tcp.SslProvider.SslReadHandler.userEventTriggered image add

                        if (evt instanceof SslCloseCompletionEvent) {
                            ctx.fireChannelInactive();
                        }

Your Environment

  • Reactor version(s) used: reactor-netty-http : 1.0.23
  • Other relevant libraries versions (eg. netty, …): netty : 4.1.81.Final
  • JVM version (java -version): openjdk version “1.8.0_342”
  • OS and version (eg. uname -a): Linux 18.04.1-Ubuntu SMP Mon Aug 23 23:07:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
cyfaxcommented, Sep 30, 2022

hi @cyfax ,

After checking with the Netty team, we changed our mind, and the PR #2518 will be merged soon, which closes the connection when receiving a close_notify, this should resolve in particular your issue in case the close_notify is received while the connection is idle in the connection pool.

But as described in the PR, some racy condition where the close_notify is received while a request is being written or when the request has been flushed but the response is waited for, then in this case you will get a PrematureCloseException, which we can’t avoid in this case.

Now, as said before, for the moment the work around suggested here should also address the issue. In your last message, evictionInterval is set to 0, that’s probably why the timer was not triggered, so maybe you need to double check why evictionInterval=PT0S and set it to a positive value like 5 seconds.

thanks.

Glad to know my suggestion and proposed fix was quite good 😃

May I ask you when this fix is scheduled to be released ? Do not bother I found it : image

1reaction
pderopcommented, Sep 27, 2022

@cyfax,

some updates:

I’m still investigating, sorry for the bit of delay (please ignore my previous message for the moment, I’m still investigating).

So, based on the last NOK.1.log and from the tcpdump, let’s focus on the 48702 port. so, indeed, it seems that the client waits for the server FIN, which is not sent by the server after the initial close_notify, and at some point in time, when the client acquires and writes to the connection, then it gets the “SSLEngine closed already” exception, and the client then closes the connection.

let’s see this from your provided logs:

at 11:02:30.234595 from tcpdump: the server sends the close_notify to the client, which acknowledges it immediately:

   44 2022-09-26 11:02:30.234595  xx.xx.xx.xx → 11.0.0.110   TLSv1.2 97 Encrypted Alert
   45 2022-09-26 11:02:30.234632   11.0.0.110 → xx.xx.xx.xx  TCP 66 48702 → 443 [ACK] Seq=1955 Ack=6348 Win=56960 Len=0 TSval=3688553944 TSecr=2932835261
   46 2022-09-26 11:02:30.235310   11.0.0.110 → xx.xx.xx.xx  TLSv1.2 97 Encrypted Alert
   47 2022-09-26 11:02:30.276361  xx.xx.xx.xx → 11.0.0.110   TCP 66 443 → 48702 [ACK] Seq=6348 Ack=1986 Win=64128 Len=0 TSval=2932835303 TSecr=3688553945

From NOK.1.log, we see that the client has received the close_notify at 11:02:30.235

11:02:30.235 DEBUG 19934 --- [reactor-http-epoll-3] reactor.netty.http.client.HttpClient     : [ed8b90b8, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443] USER_EVENT: SslCloseCompletionEvent(SUCCESS)

But after that, the server does not send a FIN to the client, even if this one has acknowledged the server close_notify with a client close_notify message (see at 11:02:30.235310)

And after 1,19 minute, at 11:03:49.680, the client then acquires the connection from the pool, writes to it, and is getting the “SslClosedEngineException: SSLEngine closed already”:

2022-09-26 11:03:49.680 DEBUG 19934 --- [reactor-http-epoll-3] r.n.http.client.HttpClientOperations     : [ed8b90b8-5, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443] Outbound error happened

io.netty.handler.ssl.SslClosedEngineException: SSLEngine closed already

then, at 11:03:49.696, the client closes the connection:

2022-09-26 11:03:49.695 TRACE 19934 --- [reactor-http-epoll-3] reactor.netty.channel.ChannelOperations  : [ed8b90b8, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443] Disposing ChannelOperation from a channel
2022-09-26 11:03:49.696 DEBUG 19934 --- [reactor-http-epoll-3] r.n.r.DefaultPooledConnectionProvider    : [ed8b90b8, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443] onStateChange(POST{uri=/card/event, connection=PooledConnection{channel=[id: 0xed8b90b8, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443]}}, [response_completed])
2022-09-26 11:03:49.696 DEBUG 19934 --- [reactor-http-epoll-3] r.n.r.DefaultPooledConnectionProvider    : [ed8b90b8, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443] onStateChange(POST{uri=/card/event, connection=PooledConnection{channel=[id: 0xed8b90b8, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443]}}, [disconnecting])
2022-09-26 11:03:49.696 DEBUG 19934 --- [reactor-http-epoll-3] reactor.netty.http.client.HttpClient     : [ed8b90b8, L:/11.0.0.110:48702 - R:fake.server.com/xx.xx.xx.xx:443] CLOSE

and at 11:03:49.701184, from tcpdump, we then see that the FIN is sent from the client to the server:

   48 2022-09-26 11:03:49.701184   11.0.0.110 → xx.xx.xx.xx  TCP 66 48702 → 443 [FIN, ACK] Seq=1986 Ack=6348 Win=56960 Len=0 TSval=3688633411 TSecr=2932835303
   49 2022-09-26 11:03:49.744334  xx.xx.xx.xx → 11.0.0.110   TCP 66 443 → 48702 [ACK] Seq=6348 Ack=1987 Win=64128 Len=0 TSval=2932914771 TSecr=3688633411

so, first, it’s strange that the server is not closing the connection after having received the close_notify ack sent by the client.

now, we are still investigating, and after discussion with the team, we do not desire the do a patch for the moment, because closing the connection when receiving the close-notify will only resolve your particular scenario, but won’t avoid other scenarios where you have already acquired the connection: in this case, if the close_notify is received while you are writing to the acquired connection, then you will get a “Connection prematurely closed BEFORE response” even with the patch. Moreover, we are still investigating if something else could be done.

In the meantime:

Please consider to do what Violeta has suggested from this old issue, which seems to be the same problem: in the issue the problem was that the server was using a Keep-Alive timeout, and the server then sends a close_notify after some period of connection inactivity.

If you are using tomcat without any specific configuration, then Keep-Alive configuration is 60 sec if I’m correct (check https://github.com/reactor/reactor-netty/issues/1318#issuecomment-702619679). And indeed, from your tcpdump.txt, the server sends a close_notify after around 60 sec if inactivity:

   42 2022-09-26 11:01:29.755141  xx.xx.xx.xx → 11.0.0.110   TLSv1.2 100 Application Data
   43 2022-09-26 11:01:29.755149   11.0.0.110 → xx.xx.xx.xx  TCP 66 48702 → 443 [ACK] Seq=1955 Ack=6317 Win=56960 Len=0 TSval=3688493465 TSecr=2932774782
   44 2022-09-26 11:02:30.234595  xx.xx.xx.xx → 11.0.0.110   TLSv1.2 97 Encrypted Alert

So, please consider to use a maxIdleTime on the Reactor Netty’s connection pool to be less than the value on the server (check https://github.com/reactor/reactor-netty/issues/1318#issuecomment-702668918)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Netty ChannelHandler closed when SSL is used
The problem is a connection reuse when the client should close properly, and open a new one. There are 2 proofs of that...
Read more >
netty/netty - Gitter
The exception javax.net.ssl.SSLException: SSLEngine closed already occurs when running multiple concurrent requests and is easily triggered when doing an nginx ...
Read more >
io.netty.handler.ssl.SslHandler - 即时通讯网(52im.net)
Once the SSL session is closed, 98 * it is not reusable, and consequently you should create a new 99 * {@link SslHandler}...
Read more >
Channel inactive not fired after SslCloseCompletionEvent ...
Channel inactive not fired after SslCloseCompletionEvent (SUCCESS) causing connection still in pool hence io.netty.handler.ssl.
Read more >
Uses of Interface io.netty.util.concurrent.Future
A channel registry which helps a user maintain the list of open Channel s and perform bulk operations on them. io.netty.channel.pool. Implementations and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found