question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OOM due to SSL key materials cached every time when there is new connection when using OpenSslCachingX509KeyManagerFactory

See original GitHub issue

Expected behavior

Recently, we found the OOM issue when switching from JDK Ssl to OpenSsl in netty.

We’re using OpenSslCachingX509KeyManagerFactory explicity, so Netty will use OpenSslCachingKeyMaterialProvide to cache and reduce the overhead of parsing the chain and the key for generation of the material.

We expect to see performance optimization but shouldn’t see OOM issue.

Actual behavior

But with stress test for TLS connection, we saw the memory linearly increasing and eventually OOM.

After debugging into the Netty and OpenJDK ssl code, we found the problem is that every time when there is a new connection, handshake cert selection callback OpenSslClientCertificateCallback is called and it will try to find the alias key materials from the cache, if it doesn’t exist it will try to find the match alias from server cert chain, which created a new alias in format of seq_id.builderIndex.keyStoreAlias, like 924450.0.key. And it will parse the chain and key, put into the cache with the new alias, and this retained the refCnt of the key material and prevented the native memory being destroyed, that’s why we eventually saw the OOM issue.

Changing to use OpenSslX509KeyManagerFactory solved this problem.

Steps to reproduce

Using OpenSslCachingX509KeyManagerFactory to set up the SSLContext, and keep issuing Issuing lots of TLS connection requests.

Minimal yet complete reproducer code (or URL to code)

Netty version

We’re using 4.1.36.Final.

JVM version (e.g. java -version)

java version “10.0.1” 2018-04-17 Java™ SE Runtime Environment 18.3 (build 10.0.1+10) Java HotSpot™ 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:23 (13 by maintainers)

github_iconTop GitHub Comments

2reactions
normanmaurercommented, Nov 6, 2019

@lvfangmin after some debugging I think I also know why people usually not see this problem.

By default we use “SunX509” as algorithm when creating the KeyManagerFactory. When this is used the JDK uses SunX509KeyManagerImpl. This one uses “stable” aliases and so the caching works as expected. You specify another algorithm and so it ends up using X509KeyManagerImpl which does not provide stable aliases.

So to fix this I think we should do two things:

  • If X509KeyManagerImpl is used we should not cache if not explicit told so
  • ensure the cache can not grow without bounds…

WDYT ?

2reactions
normanmaurercommented, Nov 5, 2019

@lvfangmin thanks will have a look

Read more comments on GitHub >

github_iconTop Results From Across the Web

X509 KeyManagerFactory not available - Stack Overflow
I want to create SSL connection. I created keystore. and trying to use x509. final KeyManagerFactory kmf = KeyManagerFactory.
Read more >
Error getting keys into a KeyManagerFactory from a PKCS12 ...
Hi, below is a script demonstrating how I'm creating a signed X.509 certificate using OpenSSL then failing to load it properly into a...
Read more >
io.netty.handler.ssl.OpenSslCachingKeyMaterialProvider.<init ...
The user explicit used OpenSslCachingX509KeyManagerFactory which signals us that its fine to cache. return new OpenSslCachingKeyMaterialProvider(keyManager, ...
Read more >
Diff - platform/external/conscrypt - Google Git
+ Log.e(TAG, "Private key is not an OpenSSLRSAPrivateKey instance, its class name ... the new connection - // using specified session - isResuming...
Read more >
Installation - Cloudera Documentation
Cloudera reserves the right to change any products at any time, ... Installing Ranger KMS backed with a Key Trustee Server and HA....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found