question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Stop vendoring packages

See original GitHub issue

The current implementation of Setuptools has its dependencies (separately for setuptools and pkg_resources) vendored into the package. The project implemented this approach because of the bootstrapping problem.

Bootstrapping Problem

Setuptools extended distutils, which was designed to sit at the root of a packaging system and presumed to be present and without any dependencies. Over time, Setuptools superseded distutils but inherited these constraints. In particular, when a system integrator wishes to build a system from sources, it requires a directed acyclic graph (DAG) of both build-time and runtime dependencies. As a result, Setuptools cannot depend on any package that uses Setuptools to build (or whose dependencies require Setuptools to build). This bootstrapping problem includes Setuptools itself, as it requires itself to build.

Vendoring

As the ecosystem grew more complex, with standards-based implementations (such as packaging) appearing as third-party packages and not available in the standard library, Setuptools found itself requiring dependencies, but because of the bootstrapping problem, Setuptools adopted a vendoring strategy (copying the packages directly into the project) as a means of requiring that functionality.

However, this approach creates constraints and complications to the project:

  • Not all packages can be vendored.
    • If the other package has vendored dependencies, those may not work in a vendored context.
    • Some packages have global state or modify global state or have interfaces that are reliant on the package layout (incl. pkg_resources, importlib_metadata, packaging), leading to unexpected failures when loaded in a global context.
  • Refactoring functionality out of the library is difficult if not impossible due to the constraints above. In particular, this project would like to move pkg_resources into a separate package, but even though pkg_resources has no dependency on setuptools to run, it still must vendor its own dependencies (is this true?).
  • When vendoring a package, it often is required to be rewritten to accommodate vendoring. Any absolute imports must be replaced by relative imports.
  • Because vendoring is a second-class approach to dependency management (and unsustainable in the general case), it often requires specialized tooling to manage the dependencies and this management can often fall in conflict with the first-class tools.
  • When vendoring dependencies, it’s the responsibility of the hosting package to re-write imports to point to the vendored copies, creating non-standard usage with sometimes unclear semantics.
  • Due to the constraints above, adding a new dependency can be an onerous process, requiring extra care and testing that may break in downstream environments whose workflows aren’t proven.
  • Because vendored dependencies are in fact de-facto satisfied, the project cannot and should not declare those dependencies as other projects do. Therefore, it’s not possible to inspect the dependencies readily as one would with standard declarations.
  • Because vendored dependencies are effectively pinned, they create impedance and require manual intervention or extra, non-standard tooling to manage the evolution of those dependencies, defaulting to a practice of erosion.
    • Because they’re pinned, it’s more difficult to discern if a particular dependency is pinned for a known good reason or simply because of the vendoring.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
mgornycommented, Mar 26, 2022

Thanks for the explanation. If the end goal is for setuptools to host the schemas, then let’s focus on the second option indeed. It’d still be nice to able to easily regen the resulting data but I don’t think that’s a priority, i.e. something to put on the “far TODO”.

1reaction
pradyunsgcommented, Oct 28, 2021

I don’t think we can do this yet.

Not until pip removes the fallbacks to direct setup.py invocation (https://pip.pypa.io/en/stable/reference/build-system/setup-py/ – in-progress) and the separation of pkg_resources is completed. Without those, I think this would exceed our available churn budget. 😃

(e) Discourage installation of Setuptools except when needed to build a package.

setuptools is installed with pip, by all supported mechanisms to install pip – https://github.com/pypa/pip/issues/10530#issuecomment-932937829. IMO this should be considered a pre-condition for doing this, and it’s worthwhile spending a few months or so publicising this change before we actually make it.


FWIW, pip’s solution for this is something I maintain separately: https://github.com/pradyunsg/vendoring (I’ll consistently use vendoring to refer to this project in the rest of this post). I’m happy to accomodate for setuptools in that. That should help alleviate many of the pain points with vendoring dependencies.

If you’re curious about how the tool works, my suggestion is to clone pip and run tox -e vendoring/nox -s vendoring and look at the vendor.txt, tools/vendoring/patches/*, and pyproject.toml files in pip’s source tree.

  • vendoring provides automated import rewriting, with an explicit error on import styles that can’t be re-written.
    • The main caveat is when a package dynamically determines what the import path is (eg: by constructing a string to pass to __import__ – in these cases, the dynamic import logic needs to be patched manually.
  • Having an explicit requirements file (with == pins, for obvious reasons) serve as the source of truth for this alleviates the lack of transparency.
  • When vendoring a package, it often is required to be rewritten to accommodate vendoring. Any absolute imports must be replaced by relative imports.

  • Because vendoring is a second-class approach to dependency management (and unsustainable in the general case), it often requires specialized tooling to manage the dependencies and this management can often fall in conflict with the first-class tools.

  • When vendoring dependencies, it’s the responsibility of the hosting package to re-write imports to point to the vendored copies, creating non-standard usage with sometimes unclear semantics.

  • Due to the constraints above, adding a new dependency can be an onerous process, requiring extra care and testing that may break in downstream environments whose workflows aren’t proven.

  • Because vendored dependencies are in fact de-facto satisfied, the project cannot and should not declare those dependencies as other projects do. Therefore, it’s not possible to inspect the dependencies readily as one would with standard declarations.

I think these constraints are non-existent / workable, if you adopt vendoring. 😃

Because vendored dependencies are effectively pinned, they create impedance and require manual intervention or extra, non-standard tooling to manage the evolution of those dependencies, defaulting to a practice of erosion.

This is somewhat true – you still have the caveat of needing to manage evolution, but you can use standard tooling (like dependabot) to manage the upgrading, if you go down the vendoring route. It’s also possible to have separate unpinned/pinned dependency declaration sets (ALA pip-compile’s workflow). You do need to ensure that the entire dependency tree is included and pinned, for using vendoring.

Basically, the moment you start vendoring stuff, you need to start thinking of the project as being managed like an application with pinned dependencies – all the corresponding dependency management constraints apply (except you run vendoring sync . instead of pip install -r [blah blah] to “install” the dependencies).

Because they’re pinned, it’s more difficult to discern if a particular dependency is pinned for a known good reason or simply because of the vendoring.

If you adopt vendoring, this is not true.

It is possible to include comments in input file to the tool (it’s basically a requirements.txt file, that’s consumed by pip), which can be useful to describe this nuance.

Refactoring functionality out of the library is difficult if not impossible due to the constraints above. In particular, this project would like to move pkg_resources into a separate package, but even though pkg_resources has no dependency on setuptools to run, it still must vendor its own dependencies (is this true?).

Yep, and… it shouldn’t be too difficult if you use vendoring to bring that package into setuptools.

Not all packages can be vendored.

True. Anything that’s non-pure-Python or provides an importable API for plugins is non-vendorable.

If the other package has vendored dependencies, those may not work in a vendored context.

Generally, not true; based on my experience with vendoring. It is possible to rewrite their imports seemlessly.

Some packages have global state or modify global state or have interfaces that are reliant on the package layout (incl. pkg_resources, importlib_metadata, packaging), leading to unexpected failures when loaded in a global context.

I’m not quite sure what you mean here – setuptools.extern.packaging.version.Version is always going to compare not equal to pip._vendor.packaging.version.Version because they’re different packages (and possibly different versions of the packaging too!). If that’s what you’re referring to, yea… but I don’t see this as being a big problem.

The only case this can be an issue is if you expect this difference to exist/not exist in some code – since that’d be fragile. It’s often really straightforward to avoid that though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Go Modules: Why You Should Stop Worrying about Vendoring
Vendoring is a way to pull in all dependency source code into a local vendor directory that lives in the same repo space...
Read more >
Modules Part 06: Vendoring - Ardan Labs
When it comes to vendor folders, module mode would still ignore a vendor folder by default and build dependencies against the module cache....
Read more >
Go Modules Reference - The Go Programming Language
To disable vendoring, use the flag -mod=readonly or -mod=mod . When vendoring is enabled, build commands like go build and go test load...
Read more >
How to disable vendoring of libraries in wheel? - Packaging
Another reason to disable library vendoring is to avoid accidental violation of license agreements (e.g. GPL).
Read more >
How to deal with dependencies of the package under vendor ...
And then I run go mod tidy && go mod vendor to make the github.com/aaa/bbb repo under the vendor folder under my code...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found