Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not caching remotely when test has failed previously

See original GitHub issue

Description of the problem / feature request:

When a test fails rerunning it doesn’t cache the results remotely nor does it try to fetch the results from remote cache. Similar problems occur when using remote execution as well. In that case the Action and ExecuteResult which are pushed to the remote execution server have DoNotCache and SkipCacheLookup respectively set to true.

This can be a problem because changing the test doesn’t actually reset the caching behavior so one has to rerun the tests so the status for previous test is passed which enables caching. We are currently deleting all the test.cache_status files under the bazel-testlogs symlink as a WAR.

Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Contents of the used .bazelrc is

build:remote_cache --remote_http_cache=<cacheip>
build:remote_cache --remote_local_fallback=true
build:remote_cache --remote_upload_local_results=true

Example shell script which reproduces the missing upload.

bazel test --config=remote_cache //:go_default_test # Test didn't pass
git apply fix_go_default_test.patch # Apply patch for fixing the test
bazel test --config=remote_cache //:go_default_test # Test passes but test result isn't put into the remote cache
bazel clean
bazel test --config=remote_cache //:go_default_test # Test isn't found in the remote cache and is run locally.

Example which reproduces the missing cache lookup

# Test is broken in environment1
environment1$ bazel test --config=remote_cache //:go_default_test 
# Test didn't pass
# Test is not broken in environment2
environment2$ bazel test --config=remote_cache //:go_default_test
# Test passes and result is uploaded to remote cache
# Fixed version of the test is fetched to environment1
environment1$ bazel test --config=remote_cache //:go_default_test
# Test isn't fetched from the remote cache and it is instead run locally.

What operating system are you running Bazel on?

Ubuntu 16.04

What’s the output of `bazel info release`?

release 0.29.1

Have you found anything relevant by searching the web?

Nope

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:10 (7 by maintainers)

Top GitHub Comments

1reaction

ulfjackcommented, Apr 9, 2020

It’s not clear how the documentation should apply to remote caching / execution. For example, should Bazel be allowed to write to the remote cache after a failed test? It’s writing to the local cache, right? Also, the remote cache generally doesn’t store failed test results, so why would we need to explicitly tell it to not read a cached entry since it generally cannot be the result from the failing run?

The intent behind the documentation is that you do not get a cached failure, not that you don’t get a cached pass (you can never get a cached pass from Skyframe or from the local action cache, and the documentation writer may only have thought of those two, not of the on-disk or remote caches). The documentation might actually pre-date the widespread use of remote caching inside Google - this part of the code is pretty old.

Regardless, we seem to have a case where we’re losing performance and remote execution capacity for no obvious reason. Even if the documentation were fully prescriptive about the remote cache/execution, this seems like a good reason to, at least, consider changing the current behavior.

1reaction

meisterTcommented, Apr 7, 2020

This is documented behavior: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/test/TestConfiguration.java;l=113

We can of course discuss whether it makes sense one way or the other.