Provide audit hook events during package installation
See original GitHub issueWhat’s the problem this feature will solve? This feature will enable third party tools to intercept package installations that could provide features such as:
- audit installed code for security purposes before any code execution takes place (such as running setup.py which could contain malware)
- verify installed packages, for example, code signing wheels or individual packages (PGP or other mechanisms)
- check for typosquatting or against blacklist/whitelist of packages
This is just an example list of features that the third party implementations can provide by listening to the installation audit hooks.
Describe the solution you’d like pip can leverage the functionality as described in PEP 578 to fire (custom) sys audit hooks before installation takes place which is native in Python 3.8+ and aims to provide visibility into the Python process for external monitoring systems.
The audit hook (for example "pip.install"
) should also pass metadata about the installation in the same manner (tuple of arguments) as the builtin python hooks are already doing
Proposed example of audit arguments:
- audit hook or pip version number as first argument so the format can be changed or extended in future pip versions if needed
- type of the installed package which would allow differentiating between wheels, sdists, local installs from a file/directory, git repositories etc…
- name of the package
- URL of the package
- package specifiers such as “==2.0.1” parsed from the requirements file
- dependency chain
(x, y, z, ...)
where x is the package being installed (name) andy
is the parent package which hasx
as the dependency,z
has a dependencyy
and so on up to the root level/package - hash (from requirements file)
- filename such as
simplewheel-1.0-py2.py3-none-any.whl
- local path to the downloaded file (preferable), or directory that contains the package files that is going to be installed (e.g. directory that contains setup.py, pyproject.toml etc…)
- additional flags (int combined using bitwise or) to denote attributes such as editable install, is_pinned, update (package is being updated)
Since not all installation/invocation methods provide the necessary attributes, those that are missing (for example hash of the package) should be replaced with None if not available. All of the proposed example metadata arguments are already available in the InstallRequirement
class from which they can be extracted into the tuple and passed to the audit hook.
These arguments should be sufficient for external monitoring tool/listening audit hook to make an informed decision about the installed package and prevent package installation by raising an exception inside the hook handler to prevent the installation of the package and any code execution e.g. running setup.py. As denoted in PEP, the exception raised inside the audit hook should not be catched by pip and just propagated further resulting in an unhandled exception, maybe including cleanup of temporary data created by pip?
There could be also other audit hooks fired by pip such as uninstall of a package or pip invocation itself (e.g. pip.invoked
with sys.argv as audit tuple arguments)
Alternative Solutions entry points would allow for almost the same functionality however there might be few additional problems related to that. The first is speed as the entrypoints would need to be imported during pip invocation which could add to delay. Exceptions thrown during that time (entrypoint import) could also cause pip to crash if not handled properly. I believe the system audit hooks are superior as they fit nicely into the python ecosystem since that is the reason why the audit hooks were designed in the first place and avoid reinventing the wheel. Also, the cited PEP would provide better reference implementation over decisions such as the above-mentioned exception throwing.
Additional context There were already similar tickets or discussions about providing a “plugin” or “hook” functionality that allows to extend the installation process or gives visibility/auditing into packages that are going to be installed. The closes feature probably being https://github.com/pypa/pip/issues/1035
There are few distinctions in the previous discussion/feature request vs. firing an audit hook. The discussion In that ticket got steered how the signature verification should be correctly implemented since getting the cryptography right is difficult and the same could be argued about the audit hooks, however, they are not designed to provide security mechanisms or sandboxing but merely just visibility into the blackbox that Python is and the same principle can be applied to pip installing packages which is a de-facto default tool in all modern Python installations.
I understand the reluctance to provide a public API as that brings problems with maintainability. That could be improved or made better by selecting different kind of attributes that are passed as metadata to the audit hook and with a combination of version numbers future proof for any potential changes that might occur. Alternatively the maintainability problem could be resolved almost completely by just passing (<pip_version>, <pip._internal.req.InstallRequirement object instance>) as the hook arguments and leaving the extraction of the necessary information to the monitoring system itself.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:12 (9 by maintainers)
It would be great if the this hook also gets argument with a local path to the directory with sources that are (verbatim) going to be used for the following installation process. For example, if the installation requires extracting some archive and then installing the package from that, the argument would be path to these extracted sources.
The reasons I am suggesting this:
Let’s make these audit hooks pip-specific. If someone wants to pick this up, please say so here and let us know how you’re thinking of implementing this! 😃