Git status API recursively lists all untracked files causing browsers to hang
See original GitHub issueDescription
The current implementation of git status uses the -u
flag which recursively lists all untracked files in all sub-directories in a Git repository. Jupyter users often work with data sets that involve a large number of small files. When such datasets are untar’d into a working Git repository, this causes the /git/status
API to return an unbounded response and freezes the entire JupyterLab front-end as well as the browser.
This operation is not paginated and the amount of files that are returned is unbounded. For e.g., see the notebook here. This downloads the IMDb dataset. Downloading the data causes the entire JupyterLab application to become unresponsive and the browser to hang. I noticed my browser taking ~10GB of RAM and causing general machine slowness.
Reproduce
(Warning: This will hang your browser)
Download and untar the IMDB Dataset in your working directory.
Alternatively,
git clone https://github.com/udacity/sagemaker-deployment
- Run the
SageMaker Project
notebook in /Project
Expected behavior
In general, any unpaginated approach is undesirable and the default behavior of the extension should provide sensible defaults that are performant. For e.g, in the /git/all_history
API, the output from git log
is intentionally restricted to 10 by default to avoid such scenarios (See https://github.com/jupyterlab/jupyterlab-git/blob/master/jupyterlab_git/git.py#L317-L325).
Since there is no native pagination in the git status
command and the adverse effects of this issue, the proposed approach is to just use -u='normal'
or omit the option altogether. (https://git-scm.com/docs/git-status#Documentation/git-status.txt---untracked-filesltmodegt). This option doesn’t recursive list every untracked file in each sub-directory and still returns a reasonable default . This is the same UX as just doing git status
in a terminal. Power users can fall back to the terminal to display each untracked file using additional flags in the API.
Context
- Python package version: 0.10.1
- Extension version: 0.10.1
- Git version: 2.13
- Operating System and its version: Amazon Linux
Command Line Output
Paste the output from your command line running `jupyter lab` here, use `--debug` if possible.
Browser Output
Paste the output from your browser Javascript console here.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:9 (4 by maintainers)
Top GitHub Comments
So the
-u
does not seem the best answer. But this is indeed a annoying for users with lots of untracked files. So as @ianhi’s tests demonstrates (thank you for those), I propose to have some kind of threshold depending on the number of files:So this will dynamically reduce the number of files - but it will still be only a first step.
At best the threshold should be an extension settings like for the history.
I prototype windowing the files list in #767. This is definitely a better way to go than this proposal.