Improve performance of loading the pull request view in virtual repositories
See original GitHub issue- Extension version: v0.35.2021121309
- VSCode Version: 1.64-insider
- OS: Windows 11, Edge
Steps to Reproduce:
- Open https://insiders.vscode.dev/github/python/cpython/pull/29581
- 🐛 GHPRI pull request view takes over 2 min to become ready
Dev tools didn’t offer any insights as to the big stall, so I added some local logging in RemoteHub and found that createPatch
from the diff
npm package does not seem to perform well on large inputs. In the PR above, there are changes to a file with 34k LOC: https://github.com/python/cpython/blob/f4095e53ab708d95e019c909d5928502775ba68f/Parser/parser.c. Generating the diff for this file takes >40 seconds (48 seconds in the run below):
[2021-12-13 19:52:19:908] [ 353] IGit.diffBetween(f4095e53ab708d95e019c909d5928502775ba68f, 3a642e61aa3e40cd0035e6661dd176a420bc265e, Parser/parser.c): Creating patch
[2021-12-13 19:53:07:202] [ 353] IGit.diffBetween(f4095e53ab708d95e019c909d5928502775ba68f, 3a642e61aa3e40cd0035e6661dd176a420bc265e, Parser/parser.c): Created patch
I isolated the expensive createPatch
call to this sample repo which you can clone locally and run with npm i; node index.js
: https://github.com/joyceerhl/vscode-diff-repro
In desktop with a local clone of the python/cpython repo, there is a perceptible delay as well, but it is not as bad (~10s to generate a diff, according to the logging that Lad put into the git extension). github.com has similar performance issues for large diffs, and they deal with this by not rendering large diffs in PRs, as well as truncating large files in the file view.
createPatch
does seem to be the culprit because commenting out this call to Repository.diffBetween
and installing GHPRI in vscode.dev on the same PR above cuts the load time for the pull request view from >2 min to >20s–still not ideal, but already a significant improvement and also the lion’s share of the problem:
https://github.com/microsoft/vscode-pull-request-github/blob/d476e27f3820101298912497fa42bdb7be58c65b/src/view/reviewManager.ts#L478-L484
I have two questions which might inform where we go next:
- In GHPRI, can we defer generating diffs until the user requests it e.g. user wants to diff changes to a file, instead of generating all the diffs before the PR view loads? For example, the large file above with 34k LOC is not actually immediately shown in the diff editor–it’s a different file altogether. (I’m not super familiar with the GHPRI extension or with tree views, but it seems like deferring requesting the patch contents may involve a rather large refactor, so please let me know if there is a more pragmatic approach we can take here)
- In RemoteHub, can we use a more performant library to generate patches, and why is
createPatch
so slow? (For example, Google’s diff-match-patch js implementation generates a different output format so it’s not a drop in replacement for the diff library, but it takes 500ms to process the same inputs versus >40000ms, and both use Myer’s algorithm–which is the same algorithm thatgit
itself uses)
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
The merged change to do the
diffBetween
later resulted in ~13s to see the tree after reload, down from ~60s on my machine.Closing since there have been significant perf improvements.