"Upload missing inputs" performance regression in Bazel 5.0 and 5.3
See original GitHub issueDescription of the bug:
cc @coeuvre – this bug is very similar to #15872, but the regression actually occurred in the 5.0.0 release. I did a git bisect between 41feb616ae and 2ac6581, and it appears https://github.com/bazelbuild/bazel/commit/db15e47d0391d904c29e6e5c632089e2479e62c2 is the source of this slower behavior.
In the actual repo affected by this, we have ~600k inputs to some actions, which are taking nearly 7 seconds on this step with a recent release-5.3.0 commit. (We thought the fix to #15872 may have helped here, but it is consistent with 5.0’s performance, and still slower than 4.2.)
Unfortunately this overhead was much smaller (<1s) with Bazel 4.2.2, and this regression, while not the same magnitude as that of #15872, is still significant for us.
What’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Using the same test repo, https://github.com/clint-stripe/action_many_inputs/, but with 300,000 inputs to the action (edit the number in WORKSPACE
to change this):
I have a script that runs the same set of commands, which reproduces this very clearly: (In the action_many_inputs
repo, with bazel
checked out and built in a sibling directory)
$ ../bazel/bazel-bin/src/bazel-dev test ... --config=remote --remote_download_toplevel --profile=/tmp/bazel-bisect.profile
Run this once, to ensure that the inputs have all been uploaded – we want to measure the time when no action inputs change.
Then, run this a few times (it’s ok if the test doesn’t actually run; I usually hit ctrl-c after ~10 seconds just to ensure the “upload missing inputs” step is complete), and check the timing in the profile.
If it’s helpful, you can look at just the event we care about:
cat "$PROFILE_PATH" | jq '.traceEvents[] | select(.name == "upload missing inputs") | .dur = (.dur/1e3)'
commit | run # | duration (ms) |
---|---|---|
db15e47d0391d904c29e6e5c632089e2479e62c2 | 1 | 2665 |
db15e47d0391d904c29e6e5c632089e2479e62c2 | 2 | 2513 |
db15e47d0391d904c29e6e5c632089e2479e62c2 | 3 | 2425 |
db15e47d0391d904c29e6e5c632089e2479e62c2 | 4 | 2484 |
3ada002c4fc690630eb6ce82b2c06bd0cc0bdda2 | 1 | 967 |
3ada002c4fc690630eb6ce82b2c06bd0cc0bdda2 | 2 | 679 |
3ada002c4fc690630eb6ce82b2c06bd0cc0bdda2 | 3 | 660 |
3ada002c4fc690630eb6ce82b2c06bd0cc0bdda2 | 4 | 560 |
Which operating system are you running Bazel on?
linux
What is the output of bazel info release
?
No response
If bazel info release
returns development version
or (@non-git)
, tell us how you built Bazel.
~/.bazel_binaries/bazel-5.0.0/bin/bazel build src:bazel-dev
What’s the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD
?
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
I still have the full profiles from most of these bazel invocations, happy to share if there’s anything else there. (Unfortunately these changes predate the more granular profiling that distinguishes between ‘collect digests’ and ‘find missing digests’.)
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
@brentleyjones it’s not – that regression took these tests from ~2s to ~45s, but it’s still several times higher than in 4.2. (I ran a few of the same tests on release-5.3.0 branch at 9d57003f8d5735eda9bd6207c51f0e9faa6c797a, with about the same results.)
The execution time increase I would expect is due to the recursive visitor pattern when building the Merkle tree. Now, that pattern is already part of the old
MerkleTree build
, so for this case I assume it the Merkle tree caching should not add any processing time. Looking at a real measurement would of course of interesting.