question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Validate VCS urls in hash-checking mode using their commit hashes

See original GitHub issue

What’s the problem this feature will solve?

It is often useful to temporarily switch a requirement to a VCS URL, for example to refer to a pull request in a fork prior to release, and then switch back once the feature is released.

This practice is incompatible with --require-hashes, however. If any requirements are hashed, then the VCS URL requirement is rejected with “The editable requirement <foo> cannot be installed when requiring hashes, because there is no single file to hash” or “Can’t verify hashes for these requirements because we don’t have a way to hash version control repositories.”

This makes it more difficult to use hashes, which ultimately discourages secure pip use.

Describe the solution you’d like

Many VCS URLs contain sha1 hashes in the refspec. (sha1 is not a great hash – more discussion below.) A requirement like git+https://github.com/requests/requests.git@e52932c427438c30c3600a690fb8093a1d643ef3#egg=requests could be accepted with --require-hashes as long as the commit validates.

Support on the pip side would help to encourage use of hashes by pip-tools users. A user could include git+https://github.com/requests/requests.git@master#egg in requirements.in, and use pip-compile --generate-hashes to write a pinned VCS URL along with other hashed requirements to requirements.txt. Currently pip-compile --generate-hashes necessarily generates an uninstallable requirements.txt with VCS URLs, which discourages use of --generate-hashes.

Alternative Solutions

  • Don’t require hashes for VCS URLs, on the grounds that some hashes are better than none. (I believe this is the approach taken by pipenv, though I haven’t verified.) This still checks hashes for most requirements, and the security risk of allowing some unhashed URLs is “attacker has compromised communications with a specific VCS server”, which for many users would not significantly change their risk profile.

  • (#4344) Same as above, but as an opt-in flag, either with a global flag or a --hash=skip flag per line.

  • Keep VCS URLs in a separate requirements file, as recommended on #4995. This adds complexity to the install process without any additional security over the flag option.

  • Some VCS URLs can be converted to artifact URLs, like https://github.com/requests/requests/archive/e52932c427438c30c3600a690fb8093a1d643ef3.zip#egg=requests, which allows them to be hashed. This doesn’t work for other packages, however, such as those that use setuptools_scm to set their version. For example, pip install https://github.com/jazzband/pip-tools/archive/f97e62ecb0d9b70965c8eff952c001d8e2722e94.zip will fail with “setuptools-scm was unable to detect version”.

Additional context

This feature request was discussed when --require-hashes was first added and ultimately rejected by the author because of the insecurity of sha1:

@jezdez on Oct 8, 2015 Contributor I was wondering about that, wouldn’t some VCSs at least allow providing a hash for a repo? Is that out of scope and need to be added at some point? Or just a silly idea?

@erikrose on Oct 8, 2015 Author Contributor So yes, this is something I thought we might add in the future: pip would recognize hash-based git refspecs as okay for hash-checking mode. It would run git fsck over them to make sure the hashes really match. (git doesn’t check them implicitly, at least in older versions, allowing a malicious git server to pass off whatever it wants.) My only reservation is that SHA1 isn’t considered a very collision-resistant hash anymore.

@sigmavirus24 on Oct 12, 2015 Member git fsck is good except that sometimes genuine commits are bad. requests has a commit that was accepted through a PR and which has (much later) been determined to be invalid by git fsck. There’s nothing we can do now without rewriting all of the history since that commit if we were to fix it.

@erikrose erikrose on Oct 12, 2015 Author Contributor I assume later commits fsck fine, correct? I’d be willing to accept that broken commits can’t be used as hash-checked VCS checkouts by pip. But the point is moot because SHA1 will probably be cost-effectively breakable in a few years.

Since then, there’s been a lot more discussion of the risks of using sha1 for refspecs following the ShAttered attack (where Google paid to create a PDF hash collision, and could have done the same for a git commit).

After that attack:

In short, a sha1 collision attack requires attackers to first issue a specially-crafted, suspicious benign commit, at great expense, and then replace it with a malicious one. So a security model that is concerned with sha1 attacks assumes that an attacker has already crafted a seemingly-benign commit and had it distributed – by which point, the Mercurial page argues, there are cheaper, easier, and more deniable ways to succeed in the attacker’s aim.

That argument is certainly debatable! So I’m hoping to discuss it here: would it be worth counting sha1 refspecs as “hashed” for now, in order to encourage use of pip install --require-hashes and pip-compile --generate-hashes?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:16
  • Comments:20 (8 by maintainers)

github_iconTop GitHub Comments

9reactions
jcushmancommented, Nov 3, 2021

Seems like handling all the edge cases has made this bug prohibitive for anyone to take on. Given how long it’s been open, I wonder if anyone is up for trying a simpler PR that just supports the per-requirement opt-out I suggested up-thread:

git+https://localhost:9000/requests/requests.git@e52932c427438c30c3600a690fb8093a1d643ef3#egg=requests --hash=skip

I imagine that’s a much simpler PR to write, and it lets projects move forward that are better off with 99% hashing than none (in particular because the requirements where they’re skipping hashing are likely to be ones they control). [Edit: and it doesn’t interfere with a more complete solution if-and-when someone is able to write one.]

3reactions
mitarcommented, Sep 18, 2020

Commit ID sadly is not complete. For example if you use git LFS to pull some data but that is not installed on the target system. Then git LFS will not pull data and your checkout of the repository will not really contain expected files.

I do not think it is maintenance hell. You do not compute hash on the file system, you compute it on git’s view of the filesystem. So, .DS_Store should be in .gitignore and this is why it is not included in the hash. If it is not in .gitignore and it is present but on the source system it was not, then hash SHOULD fail. This is why it is there. To assure exact directory reproduction.

And the way to achieve this is simply to call git archive | sha256sum and this is it.

There has been talks about pip should build an sdist for source tree installations (to solve unrelated issues), and pip can use the built sdist’s hash here. That would be consistent.

+1 on that one. Yes, hashes should probably not be computed on repository directory, but on what is being seen as source to be installed from.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pip install — pip 10.0.0.dev0 documentation
Hash-Checking Mode ... Any URL may use the #egg=name syntax (see VCS Support) to explicitly ... Use pip install -r example-requirements.txt to install:....
Read more >
git-log Documentation - Git
List commits that are reachable by following the parent links from the given commit(s), but exclude commits that are reachable from the one(s)...
Read more >
pip install — pip 8.1.1 documentation
Hash-Checking Mode. Hashes from PyPI ... Any URL may use the #egg=name syntax (see VCS Support) to explicitly state the project name.
Read more >
The Buildroot user manual
Why is there no documentation on the target? 11.5. Why are some packages not visible in the Buildroot config menu? 11.6. Why not...
Read more >
pip Documentation - Read the Docs
For more, see pip install's discussion of hash-checking mode. ... VCS projects can be installed in editable mode (using the –editable ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found