question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`--use-deprecated=html5lib` does not parse links, even though they're present

See original GitHub issue

Description

When using Pip 22.0 with --use-deprecated=html5lib with JFrog as the Index packages pip throws the error: ERROR: No matching distribution found for requests

Tested with the “requests” package on Windows 10 using pip 22.0 (fails) and pip 21.3.1 (works)

Expected behavior

--use-deprecated=html5lib should allow JFrog indexes to work.

pip version

22.0

Python version

3.10

OS

Windows

How to Reproduce

Install package from JFrog index using pip 22.0

Output

C:\>python -m pip install -vvv requests --use-deprecated=html5lib
Using pip 22.0 from <corporate_local_path>\lib\site-packages\pip (python 3.10)
Non-user install by explicit request
Created temporary directory: <corporate_user_path>\AppData\Local\Temp\pip-ephem-wheel-cache-4a5e6ucc
Created temporary directory: <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Initialized build tracking at <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Created build tracker: <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Entered build tracker: <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Created temporary directory: <corporate_user_path>\AppData\Local\Temp\pip-install-_cnfjhxu
Looking in indexes: http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple
1 location(s) to search for versions of requests:
* http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Fetching project page and analyzing links: http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Getting page http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Found index url http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple
Looking up http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/ in the cache
Request header has "max_age" as 0, cache bypassed
Starting new HTTP connection (1): <corporate_domain>:80
http://<corporate_domain>:80 "GET /artifactory/api/pypi/pypi-release/simple/requests/ HTTP/1.1" 200 None
Updating cache with response from http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Skipping link: not a file: http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Given no hashes to check 0 links for project 'requests': discarding no candidates
ERROR: Could not find a version that satisfies the requirement requests (from versions: none)
ERROR: No matching distribution found for requests
Exception information:
Traceback (most recent call last):
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 348, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 173, in _add_to_criteria
    raise RequirementsConflicted(criterion)
pip._vendor.resolvelib.resolvers.RequirementsConflicted: Requirements conflict: SpecifierRequirement('requests')
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "<corporate_local_path>\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 94, in resolve
    result = self._result = resolver.resolve(
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 481, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 350, in resolve
    raise ResolutionImpossible(e.criterion.information)
pip._vendor.resolvelib.resolvers.ResolutionImpossible: [RequirementInformation(requirement=SpecifierRequirement('requests'), parent=None)]
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "<corporate_local_path>\lib\site-packages\pip\_internal\cli\base_command.py", line 165, in exc_logging_wrapper
    status = run_func(*args)
  File "<corporate_local_path>\lib\site-packages\pip\_internal\cli\req_command.py", line 205, in wrapper
    return func(self, options, args)
  File "<corporate_local_path>\lib\site-packages\pip\_internal\commands\install.py", line 339, in run
    requirement_set = resolver.resolve(
  File "<corporate_local_path>\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 103, in resolve
    raise error from e
pip._internal.exceptions.DistributionNotFound: No matching distribution found for requests

Removed build tracker: '<corporate_user_path>\\AppData\\Local\\Temp\\pip-req-tracker-p0zhtye3'

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pfmoorecommented, Jan 30, 2022

The issue seems to be simply that the HTML doesn’t include a doctype (which seems to be required by PEP 503 and the HTML5 spec)

I’m unsure whether this is something where we should be lenient in what we accept.

Edit: Never mind, I missed that this was about the old parsing using html5lib.

1reaction
pradyunsgcommented, Jan 30, 2022

I’m able to reproduce this, with just pip’s parsing logic:

from pathlib import Path

from pip._internal.index.collector import HTMLPage, parse_links

content = Path("/tmp/page.html").read_bytes()
page = HTMLPage(content, "utf-8", "https://private.domain.example.com/index")

try:
    print("new", len(list(parse_links(page, use_deprecated_html5lib=True))))
except TypeError:
    print("old", len(list(parse_links(page))))

21.3.1

❯ python /tmp/foo.py
old 208

22.0

❯ python /tmp/foo.py
new 0
Read more comments on GitHub >

github_iconTop Results From Across the Web

html5lib - PyPI
html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all...
Read more >
How to get rid of BeautifulSoup user warning? - Stack Overflow
The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.
Read more >
[NEXUS-31057] Pypi simple index should be proper HTML5 ...
Temporary workaround is to run pip with the “--use-deprecated=html5lib” flag. https://github.com/pypa/pip/issues/10825. $ pip install --upgrade ...
Read more >
Azure Feeds breaks on newest version of Pip
Azure Feeds currently doesn't have it, which breaks Pip. Current workaround is switching to the deprecated html parser with a flag. Read more...
Read more >
[Python-Dev] It's now time to deprecate the stdlib urllib module
I am not certain if we can deprecate/remove the whole 'urllib' module without ... There is heavy usage of urllib.parse in multiple projects ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found