question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Severe memory leak when setUseEngineSocketByDefault(true)

See original GitHub issue

I’ve included below the source code of a simple test that creates 100k localhost SSL connections.

If it is run without conscrypt, it uses approx 330MB of Linux res mem. With conscrypt, it uses approx 3.7GB res mem, despite Xmx being set to 512M.

I’ve tested this both on Ubuntu 18 and Centos 7, on two different machines. One machine is a Ubuntu VM with 2 virtual cores on an Apple laptop with 6 cores, and the other machine has dual Xeons, each with 8 cores. Both tests were with OpenJDK 14.0.1 as downloaded from https://jdk.java.net/14/ and with conscrypt-openjdk-2.4.0-linux-x86_64 from the maven repo.

(Note that this memory leak happens when testing with both conscrypt 2.4.0 and 2.2.1.)

To make the code run, a keystore must first be created using the command: keytool -genkey -keyalg EC -keystore keystore -groupname secp256r1 -alias localhost -keypass password -storepass password -dname "CN=localhost,OU=X,O=X,L=X,S=X,C=X"

The conscypt jar has to be available, which can be retrieved with wget https://repo1.maven.org/maven2/org/conscrypt/conscrypt-openjdk/2.4.0/conscrypt-openjdk-2.4.0-linux-x86_64.jar

The code is compiled using javac Main.java -cp conscrypt-openjdk-2.4.0-linux-x86_64.jar

The code is run using java -Xmx512M -cp conscrypt-openjdk-2.4.0-linux-x86_64.jar:. -Djavax.net.ssl.keyStore=keystore -Djavax.net.ssl.keyStorePassword=password -Djavax.net.ssl.trustStore=keystore -Djavax.net.ssl.trustStorePassword=password Main

The code, to be placed inside Main.java:

import org.conscrypt.Conscrypt;
import org.conscrypt.OpenSSLProvider;

import java.io.*;
import javax.net.*;
import javax.net.ssl.*;
import java.net.*;
import java.security.Security;
import java.util.Scanner;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class Main {

  public static void main(String[] args) throws Exception {

    System.out.println("pid: " + ProcessHandle.current().pid());

    System.out.println("Use conscrypt? (y/n)");
    boolean useConscrypt = new Scanner(System.in).nextLine().toLowerCase().startsWith("y");

    if(useConscrypt) {
      Conscrypt.setUseEngineSocketByDefault(true);
      Security.insertProviderAt(new OpenSSLProvider(), 1);
    }

    int listenPort = 12345;

    ServerSocketFactory serverSocketFactory = SSLServerSocketFactory.getDefault();
    ServerSocket serverSocket = serverSocketFactory.createServerSocket(listenPort);

    new Thread(()->{
      try {
        while(true) {
          try(var socket = serverSocket.accept()) {
            try(var out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(socket.getOutputStream())))) {
              out.println("Hello");
              out.flush();
            }
          }
        }
      } catch (IOException e) {
        e.printStackTrace();
      }
    }).start();


    SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory.getDefault();

    int concurrentThreads = Runtime.getRuntime().availableProcessors();
    ExecutorService executor = Executors.newFixedThreadPool(concurrentThreads);

    var count = new AtomicInteger(0);
    var totalConnections = 100*1000;

    new Thread(()->{
      try {
        while(true) {
          Thread.sleep(1000);
          System.out.println("Count: " + count.get());
        }
      }
      catch (InterruptedException e) {}
    }).start();

    for(int i=0; i<totalConnections; i++) {
      executor.submit(() -> {
        try {
          SSLSocket socket = (SSLSocket) factory.createSocket("localhost", listenPort);
          socket.startHandshake();
          BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
          while ((in.readLine()) != null);

          in.close();
          socket.close();
          count.incrementAndGet();
        }
        catch (IOException e) {
          e.printStackTrace();
        }
      });
    }

    executor.shutdown();
    executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);

  }

}

Thanks in advance for any assistance.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:52 (14 by maintainers)

github_iconTop GitHub Comments

3reactions
zekroniumcommented, Feb 28, 2021

image

This is a chart of hourly recorded memory usage of a Jetty web server over a period of 470 hours, starting from the time when the Jetty process was first launched. The JVM Xmx is set to 3GB, and it’s running on Ubuntu 20.04 LTS.

The web server uses Conscrypt both for incoming and outgoing connections, including using Conscrypt for outgoing TLS JDBC connections (by installing Conscrypt as the default security provider).

It is using the CI build based on commit 52f3cf1 (January 28, 2021).

You can see that suddenly, in the last few hours, there was a 2.6 GB spike in res mem usage (from about 8.3GB to 10.9GB).

This coincided with a database upgrade, which involved the DB being shut down twice (each time for about 10 minutes). This would have meant that new JDBC connections via Conscrypt could not have been made to the DB.

Logs show that approx. 1700 Socket fail to connect to host:[redacted].com, port:[redacted]. Connection refused exceptions occurred during the DB downtime. The JDBC connection pool reported a few dozen Broken pipe/Connection is closed/Connection was killed exceptions.

It’s possible that the leak is due to existing outgoing JDBC connections being suddenly dropped, or due to new outgoing JDBC connections failing to be established.

It’s also possible that the leak was caused by the server accepting lots of incoming HTTPS requests which were blocking due to the JDBC connection pool locking up as it waited to re-establish connections to the DB.

Note that several hours have passed since the DB downtime happened, and although the Jetty server has gone back to normal, and is able to serve HTTPS connections and communicate with the DB via JDBC, its res mem footprint has stayed permanently high.

Sorry for the late join in.

We also noticed this happening in production. Our services send only out going connections and usually the memory usage spikes when the targets timeout the http request or fail to connect in general.

While testing I also noticed a weird trend which might be or might not be related, that the memory usage and especially the build up of “leaked memory” is much lower on Oracle JDK opposed to AdoptOpenJDK (Both 11).

This could be a coincidence but this is a pattern we observed where the AdoptOpenJDK instances would need to be restarted or get OOM killed every 10-15hours where as the JVM ones would be able to run for a couple of days without much trouble (still leaking memory ofc)

I have setup a proper test environment with as much similar test conditions as possible and will report back if that wasn’t just a fluke in our observations

3reactions
knaccccommented, Dec 21, 2020

@yschimke

do you think it’s definitely a Conscrypt leak

Yes, because if I simply remove Conscrypt as an OpenSSLProvider on the server, there is no longer a memory leak. Additionally, you will see earlier in this thread that I’ve used jemalloc to demonstrate that the leak is as a result of calls to native code (OPENSSL_malloc) rather than Java object leaks.

It doesn’t look like Main11 is designed to handle errors cleanly though

FYI my real-world code doesn’t look anything like Main11. Instead, it retrieves images from a list of specified web pages. That code uses Apache HTTPClient, and in turn will use Conscrypt if it is registered as a provider. It has Xmx set to 3GB, so garbage collection should have kicked in long before it exceeded 10GB of res mem.

a useful mid-point would be your script producing a known list of 10k hostnames

It’s possible that the leak happens due to unexpected connection issues, or a variety of uncommon circumstances that may only show themselves in the real world. It’s annoying that the leak is so slow, which makes it much harder to iteratively narrow down a list.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory leak in sockets - Discussions on Python.org
Hello, When you run this small code on Linux, the memory increases forever. Any idea why? This code simply create 2 servers and...
Read more >
Memory leak of the Socket's extension method 'ReceiveAsync'!
I have a long-time running service program based on asynchronous Socket. It's memory usage keep increasing over time when it is being health...
Read more >
Understanding Memory Leaks in Java - Baeldung
Memory leaks are a genuine problem in Java. In this tutorial, we'll learn what the potential causes of memory leaks are, how to...
Read more >
Possible memory leak when attempting to connect to a TCP ...
This task creates another task which will handle the TCP connection. The TCP task should attempt to create the socket, connect to the...
Read more >
Nginx Socket.io nodejs memory leak - Stack Overflow
We use nodejs(express) and nginx and when we turn on socket.io the memory usage increases too fast and not going down in 1...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found