OOM due to SSL key materials cached every time when there is new connection when using OpenSslCachingX509KeyManagerFactory
See original GitHub issueExpected behavior
Recently, we found the OOM issue when switching from JDK Ssl to OpenSsl in netty.
We’re using OpenSslCachingX509KeyManagerFactory explicity, so Netty will use OpenSslCachingKeyMaterialProvide to cache and reduce the overhead of parsing the chain and the key for generation of the material.
We expect to see performance optimization but shouldn’t see OOM issue.
Actual behavior
But with stress test for TLS connection, we saw the memory linearly increasing and eventually OOM.
After debugging into the Netty and OpenJDK ssl code, we found the problem is that every time when there is a new connection, handshake cert selection callback OpenSslClientCertificateCallback is called and it will try to find the alias key materials from the cache, if it doesn’t exist it will try to find the match alias from server cert chain, which created a new alias in format of seq_id.builderIndex.keyStoreAlias, like 924450.0.key. And it will parse the chain and key, put into the cache with the new alias, and this retained the refCnt of the key material and prevented the native memory being destroyed, that’s why we eventually saw the OOM issue.
Changing to use OpenSslX509KeyManagerFactory solved this problem.
Steps to reproduce
Using OpenSslCachingX509KeyManagerFactory to set up the SSLContext, and keep issuing Issuing lots of TLS connection requests.
Minimal yet complete reproducer code (or URL to code)
Netty version
We’re using 4.1.36.Final.
JVM version (e.g. java -version
)
java version “10.0.1” 2018-04-17 Java™ SE Runtime Environment 18.3 (build 10.0.1+10) Java HotSpot™ 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)
Issue Analytics
- State:
- Created 4 years ago
- Comments:23 (13 by maintainers)
Top GitHub Comments
@lvfangmin after some debugging I think I also know why people usually not see this problem.
By default we use “SunX509” as algorithm when creating the KeyManagerFactory. When this is used the JDK uses SunX509KeyManagerImpl. This one uses “stable” aliases and so the caching works as expected. You specify another algorithm and so it ends up using
X509KeyManagerImpl
which does not provide stable aliases.So to fix this I think we should do two things:
X509KeyManagerImpl
is used we should not cache if not explicit told soWDYT ?
@lvfangmin thanks will have a look