question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dynamic execution: tree artifact intermittently contains .tmp files

See original GitHub issue

Description of the bug:

We are trying to prove out an internal Buildfarm deployment. One of my builds fails intermittently because tree artifacts sometimes contain errant .tmp files. Sometimes the .tmp files are identical to the non-.tmp counterpart, but other times they are significantly smaller as if a download was interrupted (e.g. 1MiB instead of 7MiB).

  • We’re using dynamic execution via --internal_spawn_scheduler and --strategy={Action,Javac,GoCompilePkg,CppCompile}=dynamic.
  • We’re also using --remote_local_fallback and (the purportedly deprecated/no-op) --remote_local_fallback_strategy=sandboxed.
  • As we’re proving out the RBE deployment, we’ve disabled caching and will never see cache hits (--noremote_upload_local_results, --noremote_accept_cached).
  • We are not using --experimental_local_lockfree_output.
  • About the generating action(s):
    • The action class doesn’t define a mnemonic, so (I assume) it falls under the dynamic strategy via the Action mnemonic.
    • The action declares a tree artifact output containing copies of its inputs into a single directory (roughly cp -t $OUT $SRCS). It doesn’t create any temporary files of its own.
    • The remote execution gRPC log indicates that the client does not download any files with a .tmp extension. Rather, these (presumably) are the temporary outputs created by the remote execution strategy.
  • About the consuming action(s):
    • Consumers are genrules which invoke a utility with $(execpath :tool) $(execpath :treeartifact).
    • If the action fails, we run find -L . -ls; find -L . -type f | xargs sha256sum.
      • This is how I discovered our inputs include .tmp shadows of the expected files.

I have just added --experimental_debug_spawn_scheduler with the hope that there will be more clues in the log the next time this occurs.

What’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

~Still working on determining an actual repro case.~ I can’t reliably repro unless I’m running under the debugger and intentionally delay the local branch. (See “Any other information” below.)

Update: See https://github.com/beasleyr-vmw/repro-bazelbuild-bazel-16145 .

Which operating system are you running Bazel on?

CentOS 8

What is the output of bazel info release?

5.2.0-vmware

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What’s the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

I’m goofing around w/ IntelliJ and I set a breakpoint at https://github.com/bazelbuild/bazel/blob/5.2.0/src/main/java/com/google/devtools/build/lib/exec/AbstractSpawnStrategy.java#L279. Single-stepping through the code under the debugger, I observed the following:

  • The local branch wins and proceeds to cancel the remote branch.
  • Even though the remote branch is to be cancelled, because I’m single-stepping on the dynamic strategy’s local execution branch’s thread, the remote execution strategy proceeds and gRPC download begins. (I have a terminal running watch -n .5 ls bazel-bin/whatever/... and observe myexpectedoutput.tmp appearing.
  • I finish single-stepping and Bazel resumes normal operation. However, I observe that the .tmp files are never reaped.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
tjgqcommented, Aug 26, 2022

I agree that this is likely the same issue as https://github.com/bazelbuild/bazel/pull/11340#issuecomment-629246895 (“internal” in that discussion means the Google-internal counterpart of remote build execution).

I’ve put together PR #16170 to implement the fix proposed by jmmv. (I’m cheating a bit by using a hash instead of a counter; it will probably have to change before submission.)

@beasleyr-vmw Would you like to give this PR a spin and see if the stray .tmp files are gone? I tried producing a consistent repro but wasn’t successful.

0reactions
beasleyr-vmwcommented, Aug 27, 2022

@beasleyr-vmw Would you like to give this PR a spin and see if the stray .tmp files are gone? I tried producing a consistent repro but wasn’t successful.

Thanks for putting this together so soon. This absolutely helps and solves the problem of .tmp files landing in my output directory. One thing missing is that the .tmp files are left behind under _tmp/actions, but afaict they’re cleaned before the next execution.

Might be too late to be useful, but I finally put together a repro case: https://github.com/beasleyr-vmw/repro-bazelbuild-bazel-16145

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bazel doesn't cleanup partial input downloads for ... - GitHub
This is done as follows: It downloads the file at the exact ou... ... dynamic execution: tree artifact intermittently contains .tmp files # ......
Read more >
Tree artifacts and transient files - Julio Merino (jmmv.dev)
Write a bunch of transient or temporary files in the tree artifact directory. Write a bunch of output files in the tree artifact...
Read more >
Artifact analysis fundamentals | ENISA
In this analysis, the file structure of a malware sample is analysed without executing malicious code. The goal of this analysis is to....
Read more >
Some forensic artifacts are just like this: sometimes Visual ...
When executed, the Visual Basic app loads the form that has an icon associated with it. It turns out, at some stage Visual...
Read more >
Working With Files - Gradle User Manual
File trees. A file tree is a file collection that retains the directory structure of the files it contains and has the type...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found