Multi-module maven project with parallel builds fail with UNAUTHORIZED.
See original GitHub issueEnvironment:
- Jib version: 3.2.1
- Build tool: Maven - 3.8.5
- OS: Linux Ubuntu/MacOS
Description of the issue:
Building a multi-module Maven project with parallelized builds fails with UNAUTHORIZED
when each container uses the same base image. The build needs to run with an empty cache.
Maven command:
mvn clean verify -Djava.util.logging.config.file=logging.properties -T 1C
We enabled jib debug logs and could see that the request to fetch the layer didn’t contain an authorization token for one or more of the concurrent build steps.
We were able to reproduce this issue locally and are fairly certain that this happens when the cache is empty and then gets partially populated (manifests_configs.json
exists but not all the layers). This led us to this section: https://github.com/GoogleContainerTools/jib/blob/master/jib-core/src/main/java/com/google/cloud/tools/jib/builder/steps/PullBaseImageStep.java#L134-L140 that links to: https://github.com/GoogleContainerTools/jib/issues/2220 which describes the issue we are observing. Removing this branch makes the parallel build pass.
We are fairly certain that this is the same issue as seen in: https://github.com/GoogleContainerTools/jib/issues/2007#issuecomment-1007103136
Expected behavior:
Multiple parallel builds should correctly use the same base image (either cached or not) and build the proper tar artifacts.
Steps to reproduce:
- Configure a multi-module maven project (4 submodules were used for reproducing this issue)
- Configura all those module’s pom.xml to use jib and also use the same base image.
- Run maven:
mvn clean verify -Djava.util.logging.config.file=logging.properties -DskipTests -T 1C
- It might take more than one try to reproduce this issue (we injected an artificial sleep in the StepRunner to force this race to happen).
jib-maven-plugin
Configuration:
<plugin>
<groupId>com.google.cloud.tools</groupId>
<artifactId>jib-maven-plugin</artifactId>
<configuration>
<from>
<image>gcr.io/images/baseimage:2022.03-2@sha256:<sha256></image>
<credHelper>gcloud</credHelper>
</from>
<to>
<image>gcr.io/${project}/${artifactId}:${version}</image>
</to>
</configuration>
</plugin>
Log output:
Caused by: com.google.cloud.tools.jib.api.RegistryUnauthorizedException: Unauthorized for gcr.io/baseimage
at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call (RegistryEndpointCaller.java:163)
at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call (RegistryEndpointCaller.java:114)
at com.google.cloud.tools.jib.registry.RegistryClient.callRegistryEndpoint (RegistryClient.java:623)
at com.google.cloud.tools.jib.registry.RegistryClient.lambda$pullBlob$3 (RegistryClient.java:494)
at com.google.cloud.tools.jib.hash.Digests.computeDigest (Digests.java:104)
at com.google.cloud.tools.jib.blob.WritableContentsBlob.writeTo (WritableContentsBlob.java:37)
at com.google.cloud.tools.jib.cache.CacheStorageWriter.writeCompressedLayerBlobToDirectory (CacheStorageWriter.java:392)
at com.google.cloud.tools.jib.cache.CacheStorageWriter.writeCompressed (CacheStorageWriter.java:226)
at com.google.cloud.tools.jib.cache.Cache.writeCompressedLayer (Cache.java:130)
at com.google.cloud.tools.jib.builder.steps.ObtainBaseImageLayerStep.call (ObtainBaseImageLayerStep.java:141)
at com.google.cloud.tools.jib.builder.steps.ObtainBaseImageLayerStep.call (ObtainBaseImageLayerStep.java:39)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly (TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run (InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run (TrustedListenableFutureTask.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:829)
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:5 (5 by maintainers)
Top GitHub Comments
Closing via #3767 which implements approach 1 to check for layers presence in cache, which should address the issue for this race condition. Please re-open if needed, and thanks again for the investigation!
Hmm… it’s hard to say what is the best approach unless I review the code holistically, which I don’t think I’ll do. The problem I think is that Jib assumes a base image is cached as long as there exists a manifest JSON (saved and retrieved via
Cache.writeMetadata()
andCache.retrieveMetadata
).But the direction of 1) doesn’t sound bad. It may not be that hard to add a logic to check if all the layer files described in a manifest are present, but I don’t really know. 2) is also thinkable, but there’ll will be situations where it retrieves credentials only to never use it. Retrieving credentials may cause frictions, and it delays the start of downloading anything from a registry.
Another option I can think of is to save the manifest (calling
Cache.writeMetadata()
) only after all layers are downloaded. It may not be that difficult to add another async Step inStepsRunnder
that depends on all layer-downloading Steps, but I don’t know.