Defer environment installation if hook would be skipped
See original GitHub issueRationale
When running pre-commit run
or during git commit
, all the virtual environments are initialized (fast) and dependencies for them are installed. The latter can be slow depending on many factors (number of dependencies, whether they are cached already, etc.).
pre-commit has logic to decide if certain hooks should be invoked at all and if there are no matching files the hook would be run on, it is “(no files to check)Skipped”. However, even when this is the case, the hook’s environment still would be installed.
Possible implementation
This could be improved by leveraging the Classifier
to check if there would be modified files at all just like in _run_single_hook
, and if not, don’t just skip running the hook but also skip installing its environment.
Our use case
Click for details....
Many monorepos, where hooks are possibly defined to run on files
in certain ^subdirectory/.*$
s would greatly benefit from this. In our case, many C++ developers only work in C+±related subdirectories; however, although they would never touch python files, if another, python-related hook’s dependencies change, the other hook’s python virtualenv is recreated.
Currently, even with cached packages, this recreation (just installing packages with pip) takes 2-3 minutes on modern hardware, multiplied by the number of hooks whose additional_dependencies
changed. In our repository, one of these virtualenvs takes around 8+ minute to install (from already cached wheels). This is acceptable for the python developers (after all they are changing code which is expected to be checked), however not for the rest of the team.
Example
$ export XDG_CACHE_HOME=/tmp/.PRECOMMITCACHE/
$ tree
$ .
├── project-with-arrow
│ └── test.py
└── project-with-hdfs
└── test.py
$ cat .pre-commit-config.yaml
repos:
- repo: https://github.com/PyCQA/pylint
rev: v2.14.5
hooks:
- name: pylint-arrow
alias: pylint-arrow
id: pylint
additional_dependencies:
- arrow
files: ^project-with-arrow/.*
args: ["--disable=W,C,R"]
- name: pylint-hdfs
alias: pylint-hdfs
id: pylint
additional_dependencies:
- hdfs
files: ^project-with-hdfs/.*
args: ["--disable=W,C,R"]
$
$ ####################
$ # Current behavior #
$ ####################
$ pip install --force-reinstall pre-commit
...
Successfully installed ... pre-commit-2.20.0
$ rm -rf $XDG_CACHE_HOME
$ echo "# Let's modify project-with-arrow!" >> project-with-arrow/test.py
$ git commit -a -m "Modified arrow. During commit, both environment will be installed, but only one will be used"
[INFO] Initializing environment for https://github.com/PyCQA/pylint.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:arrow.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:hdfs.
[INFO] Installing environment for https://github.com/PyCQA/pylint. # <---------------- Two envs installed
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/PyCQA/pylint. # <---------------- Two envs installed
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
pylint-arrow.............................................................Passed
pylint-hdfs..........................................(no files to check)Skipped # <-- One env used
[master bd2d394] Modified arrow. During commit, both environment will be installed, but only one will be used
1 file changed, 1 insertion(+)
$
$ #####################
$ # Proposed behavior #
$ #####################
$ git reset --hard HEAD~
$ pip install --force-reinstall git+https://github.com/imc-trading/pre-commit.git@imc
...
Successfully installed ... pre-commit-2.20.1
$ rm -rf $XDG_CACHE_HOME
$ echo "# Let's modify again project-with-arrow!" >> project-with-arrow/test.py
$ git commit -a -m "Modified arrow. During commit, only one environment will be installed"
[INFO] Initializing environment for https://github.com/PyCQA/pylint.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:arrow.
[INFO] Initializing environment for https://github.com/PyCQA/pylint:hdfs.
[INFO] Installing environment for https://github.com/PyCQA/pylint. # <---------------- Only one env installed
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
pylint-arrow.............................................................Passed
pylint-hdfs..........................................(no files to check)Skipped
[master ba4bdd5] Modified arrow. During commit, only one environment will be installed
1 file changed, 1 insertion(+)
A PoC quick&dirty implementation is available here.
Workarounds
- When running manually
pre-commit run <hook>
, only given hooks’ environments are installed - specifying hooks in
SKIP
Searched:
- All currently open issues
- subdirectory OR subfolder
- defer
- monorepo
Related:
- https://github.com/pre-commit/pre-commit/issues/2395
Good solution is to manually invoke
pre-commit run hook-id
, however duringgit commit
it would still install unused environments - https://github.com/pre-commit/pre-commit/issues/1110
Confining pre-commit hooks to separate subdirectories can be done by either
cd
ing in a system hook, or simply running (aliased) hooks with afiles
include pattern ofsubdirectory/.*
- https://github.com/pre-commit/pre-commit/pull/2480#issuecomment-1213047024
cd
ing out-of-the-box by pre-commit has already been discussed, and I also don’t think it’s a good idea
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
--from-ref
/--to-ref
is just a shortcut to select the file list, it doesn’t change anything about the executionOh, that’s surprising.
I would expect that
SKIP
ing on the developer’s machine is fine, as in, good enough compromise, since on CI it would check everything anyway. But then the more important question is, can this false positive happen duringpre-commit run --from-ref ... --to-ref ...
?