question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Provide audit hook events during package installation

See original GitHub issue

What’s the problem this feature will solve? This feature will enable third party tools to intercept package installations that could provide features such as:

  • audit installed code for security purposes before any code execution takes place (such as running setup.py which could contain malware)
  • verify installed packages, for example, code signing wheels or individual packages (PGP or other mechanisms)
  • check for typosquatting or against blacklist/whitelist of packages

This is just an example list of features that the third party implementations can provide by listening to the installation audit hooks.

Describe the solution you’d like pip can leverage the functionality as described in PEP 578 to fire (custom) sys audit hooks before installation takes place which is native in Python 3.8+ and aims to provide visibility into the Python process for external monitoring systems.

The audit hook (for example "pip.install") should also pass metadata about the installation in the same manner (tuple of arguments) as the builtin python hooks are already doing

Proposed example of audit arguments:

  • audit hook or pip version number as first argument so the format can be changed or extended in future pip versions if needed
  • type of the installed package which would allow differentiating between wheels, sdists, local installs from a file/directory, git repositories etc…
  • name of the package
  • URL of the package
  • package specifiers such as “==2.0.1” parsed from the requirements file
  • dependency chain (x, y, z, ...) where x is the package being installed (name) and y is the parent package which has x as the dependency, z has a dependency y and so on up to the root level/package
  • hash (from requirements file)
  • filename such as simplewheel-1.0-py2.py3-none-any.whl
  • local path to the downloaded file (preferable), or directory that contains the package files that is going to be installed (e.g. directory that contains setup.py, pyproject.toml etc…)
  • additional flags (int combined using bitwise or) to denote attributes such as editable install, is_pinned, update (package is being updated)

Since not all installation/invocation methods provide the necessary attributes, those that are missing (for example hash of the package) should be replaced with None if not available. All of the proposed example metadata arguments are already available in the InstallRequirement class from which they can be extracted into the tuple and passed to the audit hook.

These arguments should be sufficient for external monitoring tool/listening audit hook to make an informed decision about the installed package and prevent package installation by raising an exception inside the hook handler to prevent the installation of the package and any code execution e.g. running setup.py. As denoted in PEP, the exception raised inside the audit hook should not be catched by pip and just propagated further resulting in an unhandled exception, maybe including cleanup of temporary data created by pip?

There could be also other audit hooks fired by pip such as uninstall of a package or pip invocation itself (e.g. pip.invoked with sys.argv as audit tuple arguments)

Alternative Solutions entry points would allow for almost the same functionality however there might be few additional problems related to that. The first is speed as the entrypoints would need to be imported during pip invocation which could add to delay. Exceptions thrown during that time (entrypoint import) could also cause pip to crash if not handled properly. I believe the system audit hooks are superior as they fit nicely into the python ecosystem since that is the reason why the audit hooks were designed in the first place and avoid reinventing the wheel. Also, the cited PEP would provide better reference implementation over decisions such as the above-mentioned exception throwing.

Additional context There were already similar tickets or discussions about providing a “plugin” or “hook” functionality that allows to extend the installation process or gives visibility/auditing into packages that are going to be installed. The closes feature probably being https://github.com/pypa/pip/issues/1035

There are few distinctions in the previous discussion/feature request vs. firing an audit hook. The discussion In that ticket got steered how the signature verification should be correctly implemented since getting the cryptography right is difficult and the same could be argued about the audit hooks, however, they are not designed to provide security mechanisms or sandboxing but merely just visibility into the blackbox that Python is and the same principle can be applied to pip installing packages which is a de-facto default tool in all modern Python installations.

I understand the reluctance to provide a public API as that brings problems with maintainability. That could be improved or made better by selecting different kind of attributes that are passed as metadata to the audit hook and with a combination of version numbers future proof for any potential changes that might occur. Alternatively the maintainability problem could be resolved almost completely by just passing (<pip_version>, <pip._internal.req.InstallRequirement object instance>) as the hook arguments and leaving the extraction of the necessary information to the monitoring system itself.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:12 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
steve-scommented, Mar 28, 2022

It would be great if the this hook also gets argument with a local path to the directory with sources that are (verbatim) going to be used for the following installation process. For example, if the installation requires extracting some archive and then installing the package from that, the argument would be path to these extracted sources.

The reasons I am suggesting this:

  • the audit can go further and inspect that the archive was extracted to “non malicious” bits. I think that, in general, the closer the audit gets to the point of “these exact bits will be installed” the better.
  • the audit would not have to worry about archive formats if all it wants is to, for example, scan all the sources for some malicious code pattern
  • it could be also used to patch the sources before installation. In GraalPython we maintain patches for some Python packages to deal with some incompatibilities and unfortunately we have to patch pip itself to patch them before the installation.
1reaction
pradyunsgcommented, Mar 26, 2022

Let’s make these audit hooks pip-specific. If someone wants to pick this up, please say so here and let us know how you’re thinking of implementing this! 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

PEP 578 – Python Runtime Audit Hooks
When an event is audited, each hook is called in the order it was added (as much as is possible), passing the event...
Read more >
Auditing hooks and security transparency for CPython
" Auditing hooks and security transparency for CPython[EuroPython 2019 - Talk - 2019-07-10 - PyCharm][Basel, CH]By Christian Heimes, ...
Read more >
Audit events - GitLab Docs
The Audit Events API returns dates and times in UTC by default, or the configured time zone on a self-managed GitLab instance. In...
Read more >
Event Types | Okta Developer
This event may be used to identify access by a user to a report data set from Okta. This may be useful to...
Read more >
ra-audit-log documentation | React-Admin Enterprise Edition
ra-audit-log also provides a data provider helper called addEventsForMutations to create events client-side, in case you don't do it on the server side....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found