question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deadlock when starting containers from multiple threads

See original GitHub issue

Problem

When I start two containers from separate threads, I get a synchronisation deadlock. The two locks in question are in DockerClientFactory and LocalImagesCache.

The LocalImagesCache lock is held by thread 1 (“main”) which is in the process of initialising the logger instance of an ElasticsearchContainer. The DockerClientFactory lock is held by thread 2 (“Thread-0”) which is in the process of starting a container.

Stack of thread 1

"main" #1 prio=5 os_prio=31 cpu=539.66ms elapsed=14.14s tid=0x000000012901c000 nid=0x2803 waiting for monitor entry  [0x000000016db54000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:179)
	- waiting to lock <0x000000061a800000> (a [Ljava.lang.Object;)
	at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:102)
	at com.github.dockerjava.api.DockerClientDelegate.listImagesCmd(DockerClientDelegate.java:168)
	at org.testcontainers.images.LocalImagesCache.maybeInitCache(LocalImagesCache.java:69)
	- locked <0x000000061a8001b8> (a org.testcontainers.images.LocalImagesCache)
	at org.testcontainers.images.LocalImagesCache.get(LocalImagesCache.java:33)
	at org.testcontainers.images.AbstractImagePullPolicy.shouldPull(AbstractImagePullPolicy.java:18)
	at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:70)
	at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:28)
	at org.testcontainers.utility.LazyFuture.getResolvedValue(LazyFuture.java:17)
	- locked <0x000000061a8003b0> (a java.util.concurrent.atomic.AtomicReference)
	at org.testcontainers.utility.LazyFuture.get(LazyFuture.java:39)
	at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1330)
	at org.testcontainers.containers.GenericContainer.logger(GenericContainer.java:640)
	at org.testcontainers.elasticsearch.ElasticsearchContainer.<init>(ElasticsearchContainer.java:85)
	at com.example.testsupport.ElasticsearchExtension.lambda$startContainerAsync$2(ElasticsearchExtension.java:149)
	at com.example.testsupport.ElasticsearchExtension$$Lambda$392/0x0000000800db6978.apply(Unknown Source)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore.lambda$getOrComputeIfAbsent$4(ExtensionValuesStore.java:86)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore$$Lambda$369/0x0000000800d9c620.get(Unknown Source)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore$MemoizingSupplier.computeValue(ExtensionValuesStore.java:223)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore$MemoizingSupplier.get(ExtensionValuesStore.java:211)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore$StoredValue.evaluate(ExtensionValuesStore.java:191)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore$StoredValue.access$100(ExtensionValuesStore.java:171)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore.getOrComputeIfAbsent(ExtensionValuesStore.java:89)
	at org.junit.jupiter.engine.execution.ExtensionValuesStore.getOrComputeIfAbsent(ExtensionValuesStore.java:93)
	at org.junit.jupiter.engine.execution.NamespaceAwareStore.getOrComputeIfAbsent(NamespaceAwareStore.java:61)
	at com.example.testsupport.ElasticsearchExtension.startContainerAsync(ElasticsearchExtension.java:148)
	at com.example.testsupport.ElasticsearchExtension.beforeAll(ElasticsearchExtension.java:55)
	at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllCallbacks$10(ClassBasedTestDescriptor.java:381)
	at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor$$Lambda$358/0x0000000800d99dc8.execute(Unknown Source)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)

Stack of thread 2

"Thread-0" #17 prio=5 os_prio=31 cpu=134.90ms elapsed=13.35s tid=0x0000000129a0b000 nid=0x6703 waiting for monitor entry  [0x0000000179472000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.testcontainers.images.LocalImagesCache.maybeInitCache(LocalImagesCache.java:61)
	- waiting to lock <0x000000061a8001b8> (a org.testcontainers.images.LocalImagesCache)
	at org.testcontainers.images.LocalImagesCache.get(LocalImagesCache.java:33)
	at org.testcontainers.images.AbstractImagePullPolicy.shouldPull(AbstractImagePullPolicy.java:18)
	at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:70)
	at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:28)
	at org.testcontainers.utility.LazyFuture.getResolvedValue(LazyFuture.java:17)
	- locked <0x0000000605918608> (a java.util.concurrent.atomic.AtomicReference)
	at org.testcontainers.utility.LazyFuture.get(LazyFuture.java:39)
	at org.testcontainers.utility.RyukResourceReaper.maybeStart(RyukResourceReaper.java:95)
	- locked <0x00000006058ca580> (a org.testcontainers.utility.RyukResourceReaper)
	at org.testcontainers.utility.RyukResourceReaper.getLabels(RyukResourceReaper.java:79)
	at org.testcontainers.DockerClientFactory.runInsideDocker(DockerClientFactory.java:374)
	at org.testcontainers.DockerClientFactory.runInsideDocker(DockerClientFactory.java:368)
	at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:238)
	- locked <0x000000061a800000> (a [Ljava.lang.Object;)
	at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:102)
	at com.github.dockerjava.api.DockerClientDelegate.authConfig(DockerClientDelegate.java:108)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:325)
	at com.righthub.publicipdata.pipeline.testsupport.TestContainerWrapper$$Lambda$386/0x0000000800db5380.run(Unknown Source)
	at java.lang.Thread.run(java.base@17.0.1/Thread.java:833)

Here is a minimal reproducer:

public static void main(String[] args) throws InterruptedException {
    var solrContainer = new SolrContainer(DockerImageName.parse("solr:8.11.1-slim"));
    var solrThread = new Thread(solrContainer::start);
    solrThread.start();

    var elasticsearchContainer = new ElasticsearchContainer(DockerImageName.parse("docker.elastic.co/elasticsearch/elasticsearch:8.1.3")); // deadlock here
    var elasticsearchThread = new Thread(elasticsearchContainer::start);
    elasticsearchThread.start();

    solrThread.join();
    elasticsearchThread.join();
}

Proposed solution

I’m not very familiar with the code base of Testcontainers, but here is the smallest change I can think of to solve this:

In LocalImagesCache.java:

  1. Change line 60 from private synchronized boolean maybeInitCache() { to private synchronized boolean maybeInitCache(DockerClient dockerClient) {.
  2. Change line 33 from maybeInitCache(); to maybeInitCache(DockerClientFactory.instance().client());.
  3. Change line 38 from if (!maybeInitCache()) { to if (!maybeInitCache(DockerClientFactory.instance().client()) {.
  4. Remove the field dockerClient (which is currently initialised to DockerClientFactory.lazyClient()).

With this change:

  • The public interface of LocalImagesCache is left unmodified.
  • The synchronisation boundaries of LocalImagesCache::maybeInitCache are left unmodified.
  • The lock in DockerClientFactory::client is acquired before the lock in LocalImagesCache::maybeInitCache, which prevents this deadlock from occurring.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Ironlinkcommented, Apr 25, 2022

With the following code:

public class Repro {
    public static void main(String[] args) throws InterruptedException {
        String classPath = ManagementFactory.getRuntimeMXBean().getClassPath();
        Arrays.stream(classPath.split(":"))
                .filter(item -> item.contains("testcontainers"))
                .forEach(System.out::println);

        var solrContainer = new SolrContainer(DockerImageName.parse("solr:8.11.1-slim"));
        var solrThread = new Thread(solrContainer::start);
        solrThread.start();

        var elasticsearchContainer = new ElasticsearchContainer(DockerImageName.parse("docker.elastic.co/elasticsearch/elasticsearch:8.1.3")); // deadlock here
        var elasticsearchThread = new Thread(elasticsearchContainer::start);
        elasticsearchThread.start();

        solrThread.join();
        elasticsearchThread.join();
    }
}

I get this output:

/Users/me/.m2/repository/org/testcontainers/postgresql/1.17.1/postgresql-1.17.1.jar
/Users/me/.m2/repository/org/testcontainers/jdbc/1.17.1/jdbc-1.17.1.jar
/Users/me/.m2/repository/org/testcontainers/database-commons/1.17.1/database-commons-1.17.1.jar
/Users/me/.m2/repository/org/testcontainers/solr/1.17.1/solr-1.17.1.jar
/Users/me/.m2/repository/org/testcontainers/testcontainers/1.17.1/testcontainers-1.17.1.jar
/Users/me/.m2/repository/org/testcontainers/elasticsearch/1.17.1/elasticsearch-1.17.1.jar
2022-04-25 14:40:45.240  INFO [org.testcontainers.utility.ImageNameSubstitutor]: Image name substitution will be performed by: DefaultImageNameSubstitutor (composite of 'ConfigurationFileImageNameSubstitutor' and 'PrefixingImageNameSubstitutor')
2022-04-25 14:40:45.258  INFO [org.testcontainers.dockerclient.DockerClientProviderStrategy]: Loaded org.testcontainers.dockerclient.UnixSocketClientProviderStrategy from ~/.testcontainers.properties, will try it first
2022-04-25 14:40:45.540  INFO [org.testcontainers.dockerclient.DockerClientProviderStrategy]: Found Docker environment with local Unix socket (unix:///var/run/docker.sock)
2022-04-25 14:40:45.541  INFO [org.testcontainers.DockerClientFactory]: Docker host IP address is localhost
2022-04-25 14:40:45.577  INFO [org.testcontainers.DockerClientFactory]: Connected to docker: 
  Server Version: 20.10.11
  API Version: 1.41
  Operating System: Docker Desktop
  Total Memory: 1988 MB
2022-04-25 14:40:45.580  INFO [org.testcontainers.DockerClientFactory]: Checking the system...
2022-04-25 14:40:45.581  INFO [org.testcontainers.DockerClientFactory]: ✔︎ Docker server version should be at least 1.6.0

And a deadlock.

Maybe the line numbers are a bit off due to Lombok? The decompiled version of the method looks like this:

    public DockerClient client() {
        synchronized(this.$lock) {
            if (this.cachedClientFailure != null) {
0reactions
bsideupcommented, May 7, 2022

Okay, I can confirm the deadlock (seems to be a regression in 1.17.x) and submitted a fix (see #5356) 👍

Thanks for the detailed report 💯

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deadlock - Jenkov.com
A deadlock is when two or more threads are blocked waiting to obtain locks that some of the other threads in the deadlock...
Read more >
PM96302: A deadlock can occur when starting and stopping ...
PM96302: A deadlock can occur when starting and stopping containers. ... mpl which is owned by: Thread "WorkUnitTimer" which is waiting for: ...
Read more >
Strange deadlock situation in multi-threaded application
I have an application with five threads working on the same database instance. The application is responsible to import data from an external ......
Read more >
C++ Tutorial: C++11/C++14 9. Deadlocks - BogoToBogo
We can create deadlock with two threads and no locks just by having each thread call join() on the std::thread object for the...
Read more >
Deadlock during startup between classloader/bundle activator ...
This code is called at multiple places: in the plugin's start() method and in the Gradle classpath container initializer. It can very well...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found