AsynchronousTlsChannelGroup#processPendingInterests can throw CancelledKeyException
See original GitHub issueIn tests of the MongoDB Java driver that use this library, I’ve seen occasional, non-deterministic failures where AsynchronousTlsChannelGroup#processPendingInterests throws CancelledKeyException, causing AsynchronousTlsChannelGroup.loop to exit. It happens in cases where we are forcing the server to close the socket in order to test failure scenarios.
I’m not exactly sure why this is happening, but I do see that in AsynchronousTlsChannelGroup.loop there is already code that wraps calls to java.nio.channels.SelectionKey#interestOps(int) in a try/catch of CancelledKeyException. Does it make sense to do a similar thing in AsynchronousTlsChannelGroup#processPendingInterests , e.g.
private void processPendingInterests() {
for (SelectionKey key : selector.keys()) {
RegisteredSocket socket = (RegisteredSocket) key.attachment();
int pending = socket.pendingOps.getAndSet(0);
if (pending != 0) {
try {
key.interestOps(key.interestOps() | pending);
} catch (CancelledKeyException e) {
// ignore
}
}
}
}
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
httpcomponents-core/RELEASE_NOTES.txt at master - GitHub
the local TLS engine quietly closes the stream instead of throwing a handshake. exception. Contributed by Oleg Kalnichevski <olegk at apache.org>.
Read more >Release Notes
Improved support for TLS upgrade and HTTP protocol upgrade (async). ... to stop reading from the underlying network channel of READ interest is...
Read more >Is it possible to make two-way SSL asynchronous?
The server can request a client certificate inside the initial TLS handshake but not verify the client certificate inside the handshake, ...
Read more >The Transport Layer Security (TLS) Protocol Version 1.2
The security parameters for the pending states can be set by the TLS Handshake Protocol, and the ChangeCipherSpec can selectively make either of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@martinandersson Below is my explanation of the problem and the fix.
The method
AsynchronousTlsChannelGroup.RegisteredSocket.closemay be called by any thread as a result of it callingAsynchronousTlsChannel.close.AsynchronousTlsChannelGroup.RegisteredSocket.closecallsSelectionKey.cancel, whichIf we now look at
Selector, we can see thatThis means that
selector.keysinAsynchronousTlsChannelGroup.processPendingInterestsmay return cancelledSelectionKeys. Callingkey.interestOpson suchSelectionKeys results inCancelledKeyExceptionas per the documentation ofSelectionKey.interestOps.Thus, depending on how
AsynchronousTlsChannelGroupandAsynchronousTlsChannelare used in a program, the program may have a race condition.Two approaches are possible:
SelectionKey.cancel,Selector.select,Selector.keys,SelectionKey.isValid,SelectionKey.interestOpsmethods inAsynchronousTlsChannelGroupin such a way that there can be no such race condition anymore;CancelledKeyExceptionwhen it happens as a result of a program having the race condition.The second approach seems (maybe surprisingly) more optimal in this case because it is both simpler and introduces smaller performance overhead assuming that
CancelledKeyExceptionis thrown much more rarely than the methodSelectionKey.cancelis called.The Selector API is already racy here. But a lot of “closing workflows” are racy and benign, as typically not much happens after a close to matter anyway.
Something that would help: having a test that show non-deterministic behavior due to this race.