--find-links should not warn about missing HTML5 doctype
See original GitHub issueDescription
This is not a duplicate of #10825 – I’ve moved a number of my comments from #10825 as this is a separate issue. That issue is about enforcing PEP 503, which states that servers implementing the python simple index protocol should have an HTML5 doctype. This is not about that.
When using --find-links
, this warning can appear:
× The package index page being used does not have a proper HTML doctype declaration.
╰─> Problematic URL: https://www.tortall.net/~robotpy/wheels/2022/roborio/
note: This is an issue with the page at the URL mentioned above.
hint: You might need to reach out to the owner of that package index, to get this fixed. See https://github.com/pypa/pip/issues/10825 for context.
Currently the --find-links
documentation says:
-f, --find-links <url> If a URL or path to an html file, then parse for links
to archives such as sdist (.tar.gz) or wheel (.whl)
files. If a local path or file:// URL that's a
directory, then look for archives in the directory
listing. Links to VCS project URLs are not supported.
There is no HTML5 doctype requirement mentioned.
To me, --find-links
serves a very different purpose than a full-up pypi-style index implementation. For environments where a full python index is too much work (or in corporate environments where working with IT is really difficult), it’s very convenient to stick a bunch of files on a webserver and be able to point pip at an arbitrary directory listing and install packages from that directory. Unfortunately, the most popular webservers in the world (and even python’s default http.server!) do not put an HTML5 doctype by default, because it simply does not matter if all you’re doing is trying to show a directory listing so users can download a file.
You might say, that it’s currently only a warning, and it’ll be a long time until we make it an error! But it’s a useless warning, and the only way to fully resolve this is to go to every webserver vendor in the world and tell them that they must use an HTML5 doctype in their directory listings because pip says so. And then those changes need to be backported to ‘stable’ linux distributions like RHEL.
In many corporate environments, developers don’t get a choice of which webserver IT is using, and so this warning is just unnecessary noise and will waste hundreds of hours for developers and ops teams.
Production-quality web servers that don’t emit HTML5 doctype by default
- Apache2 (likely most popular webserver in the world) does not use an HTML5 doctype by default
- nginx (likely the second most popular) does not even have configurable autoindexing capabilities
Others that don’t
- python’s default webserver
- Twisted web framework
- golang’s default webserver doesn’t even bother adding a doctype of any kind
- WEBrick, a ruby webserver
- busybox httpd directory index
Those that do
- Lighthttpd, since late 2016
- Apache Tomcat, since late 2019
I appreciate that html5lib adds a lot of work for pip maintainers. If there’s a way to use http.parser and ignore the doctype (which the migration from an error to a warning indicates that it is), it seems like that would save hundreds (thousands?) of person-hours for ops teams all around the world who would need to figure out how to reconfigure their webservers because pip is being unnecessarily picky.
Thanks for your consideration.
Expected behavior
No warning
pip version
22.0.3
Python version
3.10
OS
any
How to Reproduce
N/A
Output
No response
Code of Conduct
- I agree to follow the PSF Code of Conduct.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:8 (7 by maintainers)
Top GitHub Comments
Honestly, I’d be fine with dropping the doctype check entirely as well.
Yes, thanks, I should have been clear that we did it as a (hopefully temporary) workaround so that things wouldn’t remain broken for users of latest pip while its maintainers work through this.