AsynchronousTlsChannelGroup#processPendingInterests can throw CancelledKeyException
See original GitHub issueIn tests of the MongoDB Java driver that use this library, I’ve seen occasional, non-deterministic failures where AsynchronousTlsChannelGroup#processPendingInterests
throws CancelledKeyException
, causing AsynchronousTlsChannelGroup.loop
to exit. It happens in cases where we are forcing the server to close the socket in order to test failure scenarios.
I’m not exactly sure why this is happening, but I do see that in AsynchronousTlsChannelGroup.loop
there is already code that wraps calls to java.nio.channels.SelectionKey#interestOps(int)
in a try/catch of CancelledKeyException
. Does it make sense to do a similar thing in AsynchronousTlsChannelGroup#processPendingInterests
, e.g.
private void processPendingInterests() {
for (SelectionKey key : selector.keys()) {
RegisteredSocket socket = (RegisteredSocket) key.attachment();
int pending = socket.pendingOps.getAndSet(0);
if (pending != 0) {
try {
key.interestOps(key.interestOps() | pending);
} catch (CancelledKeyException e) {
// ignore
}
}
}
}
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
httpcomponents-core/RELEASE_NOTES.txt at master - GitHub
the local TLS engine quietly closes the stream instead of throwing a handshake. exception. Contributed by Oleg Kalnichevski <olegk at apache.org>.
Read more >Release Notes
Improved support for TLS upgrade and HTTP protocol upgrade (async). ... to stop reading from the underlying network channel of READ interest is...
Read more >Is it possible to make two-way SSL asynchronous?
The server can request a client certificate inside the initial TLS handshake but not verify the client certificate inside the handshake, ...
Read more >The Transport Layer Security (TLS) Protocol Version 1.2
The security parameters for the pending states can be set by the TLS Handshake Protocol, and the ChangeCipherSpec can selectively make either of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@martinandersson Below is my explanation of the problem and the fix.
The method
AsynchronousTlsChannelGroup.RegisteredSocket.close
may be called by any thread as a result of it callingAsynchronousTlsChannel.close
.AsynchronousTlsChannelGroup.RegisteredSocket.close
callsSelectionKey.cancel
, whichIf we now look at
Selector
, we can see thatThis means that
selector.keys
inAsynchronousTlsChannelGroup.processPendingInterests
may return cancelledSelectionKey
s. Callingkey.interestOps
on suchSelectionKey
s results inCancelledKeyException
as per the documentation ofSelectionKey.interestOps
.Thus, depending on how
AsynchronousTlsChannelGroup
andAsynchronousTlsChannel
are used in a program, the program may have a race condition.Two approaches are possible:
SelectionKey.cancel
,Selector.select
,Selector.keys
,SelectionKey.isValid
,SelectionKey.interestOps
methods inAsynchronousTlsChannelGroup
in such a way that there can be no such race condition anymore;CancelledKeyException
when it happens as a result of a program having the race condition.The second approach seems (maybe surprisingly) more optimal in this case because it is both simpler and introduces smaller performance overhead assuming that
CancelledKeyException
is thrown much more rarely than the methodSelectionKey.cancel
is called.The Selector API is already racy here. But a lot of “closing workflows” are racy and benign, as typically not much happens after a close to matter anyway.
Something that would help: having a test that show non-deterministic behavior due to this race.