Pipeline cache randomly crashing builds
See original GitHub issueQuestion, Bug, or Feature?
Type: Bug
Enter Task Name: CacheBeta@1
Seems to relate to #11128, though this one is closed as being fixed. It could be a regression in V1 that needs to be tracked separately, so I opened a new issue just to be sure. If you need me to copy/paste this in the old issue and reopen it, I’ll be happy to do so.
Environment
Azure pipelines with microsoft-hosted agents. Here’s the config :
variables:
AZP_CACHING_TAR: true
# ....
resources:
containers:
- container: jdk
image: 'openjdk:8'
# ....
jobs:
- job: backend_build
pool:
vmImage: 'ubuntu-latest'
container: jdk
steps:
- task: CacheBeta@1
inputs:
key: $(gradle_cache_version) | gradle | $(Agent.OS) | backend/build.gradle.kts
path: $(gradle_home)
cacheHitVar: GRADLE_CACHE_RESTORED
displayName: Cache gradle global modules
Issue Description
The same cache is used in many jobs (dependsOn is configured properly). The build crashes randomly on one of those jobs. The number of bytes downloaded or the actual failing job varies. Some builds actually pass. The smaller cache for the frontend does not seem to cause issues (though this could be anecdotal).
Error without debug on:
I seem to recall this occurring even before turning AZP_CACHING_TAR
on.
Task logs
Log with debug on.
==============================================================================
Task : Cache (Beta)
Description : Cache files between runs
Version : 1.0.0
Author : Microsoft Corporation
Help : https://aka.ms/pipeline-caching-docs
==============================================================================
Resolving key: v6|gradle|Linux|backend/build.gradle.kts
- v6 [string]
- gradle [string]
- Linux [string]
- backend/build.gradle.kts [file] --> 225DA508920031789567D8B3B75D13E2FE698ADF73FAC54356F53F04424C4093
Resolved to: v6|gradle|Linux|A5+DO6PiPpN+d9hfHG1wVboZUV9q5yA31o7xCytbzAg=
##[debug]Processed: ##vso[task.settaskvariable variable=RESTORE_STEP_RAN;issecret=False;]true
##[debug]Dedup parallelism: 192
Information, Getting a pipeline cache artifact with one of the following fingerprints:
Information, Fingerprint: `v6|gradle|Linux|A5+DO6PiPpN+d9hfHG1wVboZUV9q5yA31o7xCytbzAg=`
Information, There is a cache hit: `v6|gradle|Linux|A5+DO6PiPpN+d9hfHG1wVboZUV9q5yA31o7xCytbzAg=`
Information, Used scope: 2;db5176fd-1a78-462a-814f-f67613ee328c;refs/heads/azure-pipelines;kronostechnologies/purecloud
##[debug]Processed: ##vso[telemetry.publish area=AzurePipelinesAgent;feature=PipelineCache]{"FileCount":"0","PlanId":"6c676603-8743-4c1c-8135-a1bd2ef4b73e","JobId":"57551886-cf72-59ec-191c-cfd72feeab17","TaskInstanceId":"c2a2a553-c9c0-53db-903a-dacfa51e5f6a","CacheResult":"Hit","ActionDurationMs":"413","ActionName":"PipelineCache.RestoreCache","ActionResult":"Success","AttemptNumber":"1","ItemCount":"0","Level":"ThirdParty","CreatedUtcNow":"2019-10-04T14:33:30.0235827Z","BaseAddress":"https://vsblobprodcca1.vsblob.visualstudio.com/A6d1bcd6d-b243-4c6d-ab5f-10aa4e919702/","X_TFS_Session":"7b20b50b-7ac8-47a6-ac79-00daa66c7232","DeploymentEnvironment":"PRODUCTION","DeploymentEnvironmentIsProduction":"True","VSOAccount":"vsblobprodcca1","OSName":"Linux","OSVersion":"2019","FrameworkDescription":".NET Core ","ProcessName":"Agent.PluginHos","DotNetReleaseDword":"-1","Version":"17.158.29305.0","ExceptionCount":"0"}
Entry found at fingerprint: `v6|gradle|Linux|A5+DO6PiPpN+d9hfHG1wVboZUV9q5yA31o7xCytbzAg=`
Information, ArtifactHttpRetryMessageHandler.SendAsync: https://vsblobprodcca1.vsblob.visualstudio.com/A6d1bcd6d-b243-4c6d-ab5f-10aa4e919702/_apis/dedup/chunks/3F7C5A4C1FCEFCB8E608BAD8C8102C2838D222C9933F777B38EA8AC1D4833B3E01 attempt 1/6 failed with StatusCode RedirectMethod, IsRetryableResponse False
##[debug]Starting 'tar' with arguments '-xf - -C .'...
Information, ArtifactHttpRetryMessageHandler.SendAsync: https://vsblobprodcca1.vsblob.visualstudio.com/A6d1bcd6d-b243-4c6d-ab5f-10aa4e919702/_apis/dedup/nodes/D357183F28A34CC7AC6A80291F4773C58529A9E33CF3AF5D9A0A443788021FCB02 attempt 1/6 failed with StatusCode RedirectMethod, IsRetryableResponse False
Information, Expected size to be downloaded: 479.7 MB
Information, Downloaded 0.0 MB out of 479.7 MB (0%).
Information, Downloaded 34.3 MB out of 479.7 MB (7%)
Information, Downloaded 234.9 MB out of 479.7 MB (49%).
Information, Downloaded 408.1 MB out of 479.7 MB (85%).
##[error]A task was canceled.
##[debug]Processed: ##vso[task.logissue type=error;]A task was canceled.
##[debug]Processed: ##vso[task.complete result=Failed;]
##[debug] at Microsoft.VisualStudio.Services.Content.Common.TargetBlockExtensions.SendAllAndCompleteAsync[T1,T2](ITargetBlock`1 targetBlock, IEnumerable`1 inputs, ITargetBlock`1 finalBlock, CancellationToken token)
at Microsoft.VisualStudio.Services.BlobStore.WebApi.DedupStoreClient.DownloadToWriterAsync(DedupNode node, Int32 writerParallelism, Func`3 writer, Uri proxyUri, EdgeCache edgeCache, CancellationToken cancellationToken)
at Microsoft.VisualStudio.Services.BlobStore.WebApi.DedupStoreClient.DownloadToDestinationAsync(MaybeCached`1 dedupBuffer, DedupIdentifier dedupId, Uri proxyUri, EdgeCache edgeCache, Func`3 chunkWriter, Func`5 nodeWriter, Action`1 traceDownloadProgressFunc, CancellationToken cancellationToken)
at Microsoft.VisualStudio.Services.BlobStore.WebApi.DedupStoreClient.DownloadToStreamAsync(DedupIdentifier dedupId, Stream stream, Uri proxyUri, EdgeCache edgeCache, Action`1 traceDownloadProgressFunc, Action`1 traceFinalizeDownloadProgressFunc, CancellationToken cancellationToken)
at Microsoft.VisualStudio.Services.BlobStore.WebApi.DedupManifestArtifactClient.DownloadToStreamAsync(DedupIdentifier dedupId, Stream stream, Uri proxyUri, CancellationToken cancellationToken)
at Agent.Plugins.PipelineCache.TarUtils.<>c__DisplayClass3_1.<<DownloadAndExtractTarAsync>b__3>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Agent.Plugins.PipelineCache.TarUtils.<>c__DisplayClass3_1.<<DownloadAndExtractTarAsync>b__3>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Agent.Plugins.PipelineCache.TarUtils.RunProcessAsync(AgentTaskPluginExecutionContext context, ProcessStartInfo processStartInfo, Func`3 additionalTaskToExecuteWhilstRunningProcess, Action actionOnFailure, CancellationToken cancellationToken)
at Agent.Plugins.PipelineCache.TarUtils.RunProcessAsync(AgentTaskPluginExecutionContext context, ProcessStartInfo processStartInfo, Func`3 additionalTaskToExecuteWhilstRunningProcess, Action actionOnFailure, CancellationToken cancellationToken)
at Agent.Plugins.PipelineCache.PipelineCacheServer.DownloadPipelineCacheAsync(AgentTaskPluginExecutionContext context, DedupManifestArtifactClient dedupManifestClient, DedupIdentifier manifestId, String targetDirectory, String contentFormat, CancellationToken cancellationToken)
at Agent.Plugins.PipelineCache.PipelineCacheServer.<>c__DisplayClass1_1.<<DownloadAsync>b__2>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.VisualStudio.Services.BlobStore.Common.Telemetry.BlobStoreClientTelemetry.<>c__DisplayClass5_0.<<MeasureActionAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.VisualStudio.Services.Content.Common.Telemetry.ClientTelemetry.MeasureActionAsync[TResult,TRecord](TRecord record, Func`1 actionAsync, Func`2 actionResultToTelemetryStatus, Func`2 actionResultToItemCountAsync, Action`2 updateRecord)
at Agent.Plugins.PipelineCache.PipelineCacheServer.DownloadAsync(AgentTaskPluginExecutionContext context, Fingerprint[] fingerprints, String path, String cacheHitVariable, CancellationToken cancellationToken)
at Agent.Plugins.PipelineCache.RestorePipelineCacheV0.ProcessCommandInternalAsync(AgentTaskPluginExecutionContext context, Fingerprint fingerprint, Func`1 restoreKeysGenerator, String path, CancellationToken token)
at Agent.Plugins.PipelineCache.PipelineCacheTaskPluginBase.RunAsync(AgentTaskPluginExecutionContext context, CancellationToken token)
at Agent.PluginHost.Program.Main(String[] args)
Finishing: Cache gradle global modules
Side note
Downloading the cache, as mentioned in #11128 seems slow. It varies widely, but can be up to 1min20 for ~450Mb in my experience. Is there a way to speed this up? Should i create a separate issue to track this?
Thank you!
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
@ulrikstrid - This is the tar issue I was referring to, which should be fixed in the next agent release. ##[debug]Starting ‘tar’ with arguments ‘-xf - -C .’… So, if a cache entry was uploaded as a tar, it will always be downloaded (untarred) - which is currently broken on Linux.
Yup, there is a race condition which causes it to fail, very rarely pass. Closing the issue. Thanks.