question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Git status API recursively lists all untracked files causing browsers to hang

See original GitHub issue

Description

The current implementation of git status uses the -u flag which recursively lists all untracked files in all sub-directories in a Git repository. Jupyter users often work with data sets that involve a large number of small files. When such datasets are untar’d into a working Git repository, this causes the /git/status API to return an unbounded response and freezes the entire JupyterLab front-end as well as the browser.

This operation is not paginated and the amount of files that are returned is unbounded. For e.g., see the notebook here. This downloads the IMDb dataset. Downloading the data causes the entire JupyterLab application to become unresponsive and the browser to hang. I noticed my browser taking ~10GB of RAM and causing general machine slowness.

Reproduce

(Warning: This will hang your browser)

Download and untar the IMDB Dataset in your working directory.

Alternatively,

  1. git clone https://github.com/udacity/sagemaker-deployment
  2. Run the SageMaker Project notebook in /Project

Expected behavior

In general, any unpaginated approach is undesirable and the default behavior of the extension should provide sensible defaults that are performant. For e.g, in the /git/all_history API, the output from git log is intentionally restricted to 10 by default to avoid such scenarios (See https://github.com/jupyterlab/jupyterlab-git/blob/master/jupyterlab_git/git.py#L317-L325).

Since there is no native pagination in the git status command and the adverse effects of this issue, the proposed approach is to just use -u='normal' or omit the option altogether. (https://git-scm.com/docs/git-status#Documentation/git-status.txt---untracked-filesltmodegt). This option doesn’t recursive list every untracked file in each sub-directory and still returns a reasonable default . This is the same UX as just doing git status in a terminal. Power users can fall back to the terminal to display each untracked file using additional flags in the API.

Context

  • Python package version: 0.10.1
  • Extension version: 0.10.1
  • Git version: 2.13
  • Operating System and its version: Amazon Linux
Command Line Output
Paste the output from your command line running `jupyter lab` here, use `--debug` if possible.
Browser Output
Paste the output from your browser Javascript console here.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
fcollonvalcommented, Aug 13, 2020

So the -u does not seem the best answer. But this is indeed a annoying for users with lots of untracked files. So as @ianhi’s tests demonstrates (thank you for those), I propose to have some kind of threshold depending on the number of files:

cmd = ["git", "status", "--porcelain", "-z"]
if (len(`git status -su`)) < threshold:
    cmd.append("-u")

So this will dynamically reduce the number of files - but it will still be only a first step.

At best the threshold should be an extension settings like for the history.

0reactions
fcollonvalcommented, Sep 17, 2020

I prototype windowing the files list in #767. This is definitely a better way to go than this proposal.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How can I get git to list all untracked files recursively?
1. git status --untracked-files=all should be equivalent to git status -u , and that should show untracked files in directories where no other ......
Read more >
shell script that lists all the existing files ignored by your git ...
1 Answer. Sorted by: 2. You can use git ls-files --others --ignored --exclude-standard to list untracked files that were ignored by the rules...
Read more >
git-config Documentation - Git
List all variables set in config file, along with their values. --fixed-value ... Specify whether to output untracked files in git status in...
Read more >
How to git clean untracked files example - TheServerSide.com
Remember that 'git clean' only removes untracked files. To find out if a file is tracked or not, developers can issue the 'git...
Read more >
Undo possibilities in Git - GitLab Docs
If it's new, it is not yet tracked by Git. You add the file to your local repository ( git add ), which...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found