question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Build event uploader is not robust to upload failures and does not respect no-cache tag

See original GitHub issue

Description of the problem / feature request:

The BEP code that uploads missing files to remote cache has problematic behavior in two cases:

  1. Remote cache uploads that happen during bazel build are robust to upload failures: if a CAS blob upload fails even after retries, bazel will print a warning message but continue the build. However, the BEP uploader stage will also try to upload any missing blobs referenced in build events, and if the upload fails there, the whole build fails.
  2. While no-cache tag works correctly in that an AC entry never gets written to the remote cache, the BEP uploader’s “upload-referenced-blobs” behavior means that no-cache cannot be used to prevent writing the action outputs (= CAS blobs) to the remote cache. In particular, we would like to use no-cache tags for actions that generate large artifacts that change often, and therefore not upload them to remote cache.

--nobuild_event_json_file_path_conversion fixes these issues, but it has the downside that none of the entries in the BES report reference the files in the remote cache, even if they were uploaded there as part of the build process.

Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Run a remote cache at grpc://localhost:8980 (I used bb-storage).

Create two targets that output a large file, and add a no-cache tag to the other one.

$ cat BUILD 
genrule(
   name = "bigfile",
   outs = ["bigfile.img"],
   cmd = "fallocate -l 1G \"$@\"",
)
genrule(
   name = "bigfile_nocache",
   outs = ["bigfile_nocache.img"],
   cmd = "fallocate -l 1G \"$@\"",
   tags = ["no-cache"],
)

We simulate upload failures by disabling retries and using a small timeout value. Without --build_event_json_file the build of //:bigfile passes with warnings:

$ bazel build //:bigfile --remote_cache=grpc://localhost:8980 --remote_instance_name=default --remote_retries=0 --remote_timeout=3
INFO: Invocation ID: 5a9ad001-0658-4bc6-8167-bdb9be6ecc3a
INFO: Analyzed target //:bigfile (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
WARNING: Writing to Remote Cache:
BulkTransferException
Target //:bigfile up-to-date:
  bazel-bin/bigfile.img
INFO: Elapsed time: 6.079s, Critical Path: 5.93s
INFO: 1 process: 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
$ echo $?
0

With --build_event_json_file the build fails:

$ bazel build //:bigfile --remote_cache=grpc://localhost:8980 --remote_instance_name=default --remote_retries=0 --remote_timeout=3 --build_event_json_file=be.json
INFO: Invocation ID: 468776d9-4ca8-4501-8a1b-b610df0b67ca
INFO: Analyzed target //:bigfile (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
WARNING: Writing to Remote Cache:
BulkTransferException
Target //:bigfile up-to-date:
  bazel-bin/bigfile.img
INFO: Elapsed time: 6.120s, Critical Path: 5.96s
INFO: 1 process: 1 linux-sandbox.
ERROR: Unable to write all BEP events to file due to 'java.io.IOException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 2999124047ns'
INFO: Build completed successfully, 2 total actions
$ echo $?
38

The BEP uploader also doesn’t respect the no-cache tag (notice that the build does not emit BulkTransferException anymore, because it doesn’t attempt to upload the results due to the no-cache tag):

$ bazel build //:bigfile_nocache --remote_cache=grpc://localhost:8980 --remote_instance_name=default --remote_retries=0 --remote_timeout=3 --build_event_json_file=be.json
INFO: Invocation ID: af3a755b-6f51-4333-9f3c-95deb342c731
INFO: Analyzed target //:bigfile_nocache (5 packages loaded, 7 targets configured).
INFO: Found 1 target...
Target //:bigfile_nocache up-to-date:
  bazel-bin/bigfile_nocache.img
INFO: Elapsed time: 3.074s, Critical Path: 2.92s
INFO: 1 process: 1 linux-sandbox.
ERROR: Unable to write all BEP events to file due to 'java.io.IOException: io.grpc.StatusRuntimeException: UNKNOWN: context deadline exceeded'
INFO: Build completed successfully, 2 total actions
$ echo $?
38

What operating system are you running Bazel on?

Ubuntu 18.04.3 LTS

What’s the output of bazel info release?

release 3.1.0

Have you found anything relevant by searching the web?

No.

Any other information, logs, or outputs that you want to share?

I wonder if the BES uploader should behave as if --nobuild_event_json_file_path_conversion was passed for the files that have not been uploaded to the remote cache by the build?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:6
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
Wyveraldcommented, Dec 7, 2021
1reaction
coeuvrecommented, Oct 28, 2021

Sorry for the delay. I will try to make this into 5.0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting Slow Uploads - BuildBuddy
These slow uploads should only happen once when artifacts are initially written to the cache, and shouldn't happen on subsequent builds.
Read more >
Caching general build artifacts between stages - GitLab.org
The only problem that I can think of is that you need to create, upload, download and restore artifacts this may not be...
Read more >
Azure blob storage and effective use of cache control property
CacheControl is a header supported by HTTP 1.1 protocol and it can have any of the following values like Private, Public, No-Cache and...
Read more >
How to force a web browser NOT to cache images
The problem is that the image shown does not get refreshed. The old image is still shown, even though the database holds the...
Read more >
Proxy Cache plugin - Kong Docs
Cache entities can also be forcefully purged via the Admin API prior to their ... Uploading the declarative configuration using the /config endpoint ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found