Build event uploader is not robust to upload failures and does not respect no-cache tag
See original GitHub issueDescription of the problem / feature request:
The BEP code that uploads missing files to remote cache has problematic behavior in two cases:
- Remote cache uploads that happen during bazel build are robust to upload failures: if a CAS blob upload fails even after retries, bazel will print a warning message but continue the build. However, the BEP uploader stage will also try to upload any missing blobs referenced in build events, and if the upload fails there, the whole build fails.
- While
no-cache
tag works correctly in that an AC entry never gets written to the remote cache, the BEP uploader’s “upload-referenced-blobs” behavior means thatno-cache
cannot be used to prevent writing the action outputs (= CAS blobs) to the remote cache. In particular, we would like to useno-cache
tags for actions that generate large artifacts that change often, and therefore not upload them to remote cache.
--nobuild_event_json_file_path_conversion
fixes these issues, but it has the downside that none of the entries in the BES report reference the files in the remote cache, even if they were uploaded there as part of the build process.
Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Run a remote cache at grpc://localhost:8980 (I used bb-storage).
Create two targets that output a large file, and add a no-cache
tag to the other one.
$ cat BUILD
genrule(
name = "bigfile",
outs = ["bigfile.img"],
cmd = "fallocate -l 1G \"$@\"",
)
genrule(
name = "bigfile_nocache",
outs = ["bigfile_nocache.img"],
cmd = "fallocate -l 1G \"$@\"",
tags = ["no-cache"],
)
We simulate upload failures by disabling retries and using a small timeout value. Without --build_event_json_file
the build of //:bigfile
passes with warnings:
$ bazel build //:bigfile --remote_cache=grpc://localhost:8980 --remote_instance_name=default --remote_retries=0 --remote_timeout=3
INFO: Invocation ID: 5a9ad001-0658-4bc6-8167-bdb9be6ecc3a
INFO: Analyzed target //:bigfile (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
WARNING: Writing to Remote Cache:
BulkTransferException
Target //:bigfile up-to-date:
bazel-bin/bigfile.img
INFO: Elapsed time: 6.079s, Critical Path: 5.93s
INFO: 1 process: 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
$ echo $?
0
With --build_event_json_file
the build fails:
$ bazel build //:bigfile --remote_cache=grpc://localhost:8980 --remote_instance_name=default --remote_retries=0 --remote_timeout=3 --build_event_json_file=be.json
INFO: Invocation ID: 468776d9-4ca8-4501-8a1b-b610df0b67ca
INFO: Analyzed target //:bigfile (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
WARNING: Writing to Remote Cache:
BulkTransferException
Target //:bigfile up-to-date:
bazel-bin/bigfile.img
INFO: Elapsed time: 6.120s, Critical Path: 5.96s
INFO: 1 process: 1 linux-sandbox.
ERROR: Unable to write all BEP events to file due to 'java.io.IOException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 2999124047ns'
INFO: Build completed successfully, 2 total actions
$ echo $?
38
The BEP uploader also doesn’t respect the no-cache
tag (notice that the build does not emit BulkTransferException anymore, because it doesn’t attempt to upload the results due to the no-cache
tag):
$ bazel build //:bigfile_nocache --remote_cache=grpc://localhost:8980 --remote_instance_name=default --remote_retries=0 --remote_timeout=3 --build_event_json_file=be.json
INFO: Invocation ID: af3a755b-6f51-4333-9f3c-95deb342c731
INFO: Analyzed target //:bigfile_nocache (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
Target //:bigfile_nocache up-to-date:
bazel-bin/bigfile_nocache.img
INFO: Elapsed time: 3.074s, Critical Path: 2.92s
INFO: 1 process: 1 linux-sandbox.
ERROR: Unable to write all BEP events to file due to 'java.io.IOException: io.grpc.StatusRuntimeException: UNKNOWN: context deadline exceeded'
INFO: Build completed successfully, 2 total actions
$ echo $?
38
What operating system are you running Bazel on?
Ubuntu 18.04.3 LTS
What’s the output of bazel info release
?
release 3.1.0
Have you found anything relevant by searching the web?
No.
Any other information, logs, or outputs that you want to share?
I wonder if the BES uploader should behave as if --nobuild_event_json_file_path_conversion
was passed for the files that have not been uploaded to the remote cache by the build?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:11 (4 by maintainers)
Fixed by https://github.com/bazelbuild/bazel/pull/14338 and cherrypicked into 5.0.0rc2 by https://github.com/bazelbuild/bazel/pull/14389.
Sorry for the delay. I will try to make this into 5.0.