dynamic execution: tree artifact intermittently contains .tmp files
See original GitHub issueDescription of the bug:
We are trying to prove out an internal Buildfarm deployment. One of my builds fails intermittently because tree artifacts sometimes contain errant .tmp
files. Sometimes the .tmp
files are identical to the non-.tmp
counterpart, but other times they are significantly smaller as if a download was interrupted (e.g. 1MiB instead of 7MiB).
- We’re using dynamic execution via
--internal_spawn_scheduler
and--strategy={Action,Javac,GoCompilePkg,CppCompile}=dynamic
. - We’re also using
--remote_local_fallback
and (the purportedly deprecated/no-op)--remote_local_fallback_strategy=sandboxed
. - As we’re proving out the RBE deployment, we’ve disabled caching and will never see cache hits (
--noremote_upload_local_results
,--noremote_accept_cached
). - We are not using
--experimental_local_lockfree_output
. - About the generating action(s):
- The action class doesn’t define a mnemonic, so (I assume) it falls under the dynamic strategy via the
Action
mnemonic. - The action declares a tree artifact output containing copies of its inputs into a single directory (roughly
cp -t $OUT $SRCS
). It doesn’t create any temporary files of its own. - The remote execution gRPC log indicates that the client does not download any files with a
.tmp
extension. Rather, these (presumably) are the temporary outputs created by the remote execution strategy.
- The action class doesn’t define a mnemonic, so (I assume) it falls under the dynamic strategy via the
- About the consuming action(s):
- Consumers are genrules which invoke a utility with
$(execpath :tool) $(execpath :treeartifact)
. - If the action fails, we run
find -L . -ls; find -L . -type f | xargs sha256sum
.- This is how I discovered our inputs include
.tmp
shadows of the expected files.
- This is how I discovered our inputs include
- Consumers are genrules which invoke a utility with
I have just added --experimental_debug_spawn_scheduler
with the hope that there will be more clues in the log the next time this occurs.
What’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
~Still working on determining an actual repro case.~ I can’t reliably repro unless I’m running under the debugger and intentionally delay the local branch. (See “Any other information” below.)
Update: See https://github.com/beasleyr-vmw/repro-bazelbuild-bazel-16145 .
Which operating system are you running Bazel on?
CentOS 8
What is the output of bazel info release
?
5.2.0-vmware
If bazel info release
returns development version
or (@non-git)
, tell us how you built Bazel.
No response
What’s the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD
?
No response
Have you found anything relevant by searching the web?
- https://jmmv.dev/series.html#Bazel dynamic execution is a great read for outsiders.
- https://github.com/bazelbuild/bazel/issues/11339
- https://github.com/bazelbuild/bazel/issues/12454
Any other information, logs, or outputs that you want to share?
I’m goofing around w/ IntelliJ and I set a breakpoint at https://github.com/bazelbuild/bazel/blob/5.2.0/src/main/java/com/google/devtools/build/lib/exec/AbstractSpawnStrategy.java#L279. Single-stepping through the code under the debugger, I observed the following:
- The local branch wins and proceeds to cancel the remote branch.
- Even though the remote branch is to be cancelled, because I’m single-stepping on the dynamic strategy’s local execution branch’s thread, the remote execution strategy proceeds and gRPC download begins. (I have a terminal running
watch -n .5 ls bazel-bin/whatever/...
and observemyexpectedoutput.tmp
appearing. - I finish single-stepping and Bazel resumes normal operation. However, I observe that the
.tmp
files are never reaped.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
I agree that this is likely the same issue as https://github.com/bazelbuild/bazel/pull/11340#issuecomment-629246895 (“internal” in that discussion means the Google-internal counterpart of remote build execution).
I’ve put together PR #16170 to implement the fix proposed by jmmv. (I’m cheating a bit by using a hash instead of a counter; it will probably have to change before submission.)
@beasleyr-vmw Would you like to give this PR a spin and see if the stray .tmp files are gone? I tried producing a consistent repro but wasn’t successful.
Thanks for putting this together so soon. This absolutely helps and solves the problem of .tmp files landing in my output directory. One thing missing is that the .tmp files are left behind under
_tmp/actions
, but afaict they’re cleaned before the next execution.Might be too late to be useful, but I finally put together a repro case: https://github.com/beasleyr-vmw/repro-bazelbuild-bazel-16145