Deadlock when starting containers from multiple threads
See original GitHub issueProblem
When I start two containers from separate threads, I get a synchronisation deadlock. The two locks in question are in DockerClientFactory
and LocalImagesCache
.
The LocalImagesCache
lock is held by thread 1 (“main”) which is in the process of initialising the logger instance of an ElasticsearchContainer
. The DockerClientFactory
lock is held by thread 2 (“Thread-0”) which is in the process of starting a container.
Stack of thread 1
"main" #1 prio=5 os_prio=31 cpu=539.66ms elapsed=14.14s tid=0x000000012901c000 nid=0x2803 waiting for monitor entry [0x000000016db54000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:179)
- waiting to lock <0x000000061a800000> (a [Ljava.lang.Object;)
at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:102)
at com.github.dockerjava.api.DockerClientDelegate.listImagesCmd(DockerClientDelegate.java:168)
at org.testcontainers.images.LocalImagesCache.maybeInitCache(LocalImagesCache.java:69)
- locked <0x000000061a8001b8> (a org.testcontainers.images.LocalImagesCache)
at org.testcontainers.images.LocalImagesCache.get(LocalImagesCache.java:33)
at org.testcontainers.images.AbstractImagePullPolicy.shouldPull(AbstractImagePullPolicy.java:18)
at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:70)
at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:28)
at org.testcontainers.utility.LazyFuture.getResolvedValue(LazyFuture.java:17)
- locked <0x000000061a8003b0> (a java.util.concurrent.atomic.AtomicReference)
at org.testcontainers.utility.LazyFuture.get(LazyFuture.java:39)
at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1330)
at org.testcontainers.containers.GenericContainer.logger(GenericContainer.java:640)
at org.testcontainers.elasticsearch.ElasticsearchContainer.<init>(ElasticsearchContainer.java:85)
at com.example.testsupport.ElasticsearchExtension.lambda$startContainerAsync$2(ElasticsearchExtension.java:149)
at com.example.testsupport.ElasticsearchExtension$$Lambda$392/0x0000000800db6978.apply(Unknown Source)
at org.junit.jupiter.engine.execution.ExtensionValuesStore.lambda$getOrComputeIfAbsent$4(ExtensionValuesStore.java:86)
at org.junit.jupiter.engine.execution.ExtensionValuesStore$$Lambda$369/0x0000000800d9c620.get(Unknown Source)
at org.junit.jupiter.engine.execution.ExtensionValuesStore$MemoizingSupplier.computeValue(ExtensionValuesStore.java:223)
at org.junit.jupiter.engine.execution.ExtensionValuesStore$MemoizingSupplier.get(ExtensionValuesStore.java:211)
at org.junit.jupiter.engine.execution.ExtensionValuesStore$StoredValue.evaluate(ExtensionValuesStore.java:191)
at org.junit.jupiter.engine.execution.ExtensionValuesStore$StoredValue.access$100(ExtensionValuesStore.java:171)
at org.junit.jupiter.engine.execution.ExtensionValuesStore.getOrComputeIfAbsent(ExtensionValuesStore.java:89)
at org.junit.jupiter.engine.execution.ExtensionValuesStore.getOrComputeIfAbsent(ExtensionValuesStore.java:93)
at org.junit.jupiter.engine.execution.NamespaceAwareStore.getOrComputeIfAbsent(NamespaceAwareStore.java:61)
at com.example.testsupport.ElasticsearchExtension.startContainerAsync(ElasticsearchExtension.java:148)
at com.example.testsupport.ElasticsearchExtension.beforeAll(ElasticsearchExtension.java:55)
at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllCallbacks$10(ClassBasedTestDescriptor.java:381)
at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor$$Lambda$358/0x0000000800d99dc8.execute(Unknown Source)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
Stack of thread 2
"Thread-0" #17 prio=5 os_prio=31 cpu=134.90ms elapsed=13.35s tid=0x0000000129a0b000 nid=0x6703 waiting for monitor entry [0x0000000179472000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.testcontainers.images.LocalImagesCache.maybeInitCache(LocalImagesCache.java:61)
- waiting to lock <0x000000061a8001b8> (a org.testcontainers.images.LocalImagesCache)
at org.testcontainers.images.LocalImagesCache.get(LocalImagesCache.java:33)
at org.testcontainers.images.AbstractImagePullPolicy.shouldPull(AbstractImagePullPolicy.java:18)
at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:70)
at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:28)
at org.testcontainers.utility.LazyFuture.getResolvedValue(LazyFuture.java:17)
- locked <0x0000000605918608> (a java.util.concurrent.atomic.AtomicReference)
at org.testcontainers.utility.LazyFuture.get(LazyFuture.java:39)
at org.testcontainers.utility.RyukResourceReaper.maybeStart(RyukResourceReaper.java:95)
- locked <0x00000006058ca580> (a org.testcontainers.utility.RyukResourceReaper)
at org.testcontainers.utility.RyukResourceReaper.getLabels(RyukResourceReaper.java:79)
at org.testcontainers.DockerClientFactory.runInsideDocker(DockerClientFactory.java:374)
at org.testcontainers.DockerClientFactory.runInsideDocker(DockerClientFactory.java:368)
at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:238)
- locked <0x000000061a800000> (a [Ljava.lang.Object;)
at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:102)
at com.github.dockerjava.api.DockerClientDelegate.authConfig(DockerClientDelegate.java:108)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:325)
at com.righthub.publicipdata.pipeline.testsupport.TestContainerWrapper$$Lambda$386/0x0000000800db5380.run(Unknown Source)
at java.lang.Thread.run(java.base@17.0.1/Thread.java:833)
Here is a minimal reproducer:
public static void main(String[] args) throws InterruptedException {
var solrContainer = new SolrContainer(DockerImageName.parse("solr:8.11.1-slim"));
var solrThread = new Thread(solrContainer::start);
solrThread.start();
var elasticsearchContainer = new ElasticsearchContainer(DockerImageName.parse("docker.elastic.co/elasticsearch/elasticsearch:8.1.3")); // deadlock here
var elasticsearchThread = new Thread(elasticsearchContainer::start);
elasticsearchThread.start();
solrThread.join();
elasticsearchThread.join();
}
Proposed solution
I’m not very familiar with the code base of Testcontainers, but here is the smallest change I can think of to solve this:
- Change line 60 from
private synchronized boolean maybeInitCache() {
toprivate synchronized boolean maybeInitCache(DockerClient dockerClient) {
. - Change line 33 from
maybeInitCache();
tomaybeInitCache(DockerClientFactory.instance().client());
. - Change line 38 from
if (!maybeInitCache()) {
toif (!maybeInitCache(DockerClientFactory.instance().client()) {
. - Remove the field
dockerClient
(which is currently initialised toDockerClientFactory.lazyClient()
).
With this change:
- The public interface of
LocalImagesCache
is left unmodified. - The synchronisation boundaries of
LocalImagesCache::maybeInitCache
are left unmodified. - The lock in
DockerClientFactory::client
is acquired before the lock inLocalImagesCache::maybeInitCache
, which prevents this deadlock from occurring.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Deadlock - Jenkov.com
A deadlock is when two or more threads are blocked waiting to obtain locks that some of the other threads in the deadlock...
Read more >PM96302: A deadlock can occur when starting and stopping ...
PM96302: A deadlock can occur when starting and stopping containers. ... mpl which is owned by: Thread "WorkUnitTimer" which is waiting for: ...
Read more >Strange deadlock situation in multi-threaded application
I have an application with five threads working on the same database instance. The application is responsible to import data from an external ......
Read more >C++ Tutorial: C++11/C++14 9. Deadlocks - BogoToBogo
We can create deadlock with two threads and no locks just by having each thread call join() on the std::thread object for the...
Read more >Deadlock during startup between classloader/bundle activator ...
This code is called at multiple places: in the plugin's start() method and in the Gradle classpath container initializer. It can very well...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
With the following code:
I get this output:
And a deadlock.
Maybe the line numbers are a bit off due to Lombok? The decompiled version of the method looks like this:
Okay, I can confirm the deadlock (seems to be a regression in 1.17.x) and submitted a fix (see #5356) 👍
Thanks for the detailed report 💯