Documentation example fails with `proxy URL with no authority`
See original GitHub issueRunning the example from the documentation yields this:
10:11 $ scrapy runspider quotes.py
2018-07-11 10:12:04 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-07-11 10:12:04 [scrapy.utils.log] INFO: Versions: lxml 3.5.0.0, libxml2 2.9.3, cssselect 0.9.1, parsel 1.5.0, w3lib 1.19.0, Twisted 16.0.0, Python 2.7.12 (default, Dec 4 2017, 14:50:18) - [GCC 5.4.0 20160609], pyOpenSSL 0.15.1 (OpenSSL 1.0.2g 1 Mar 2016), cryptography 1.2.3, Platform Linux-4.4.0-130-generic-x86_64-with-Ubuntu-16.04-xenial
2018-07-11 10:12:04 [scrapy.crawler] INFO: Overridden settings: {'SPIDER_LOADER_WARN_ONLY': True}
2018-07-11 10:12:04 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
Unhandled error in Deferred:
2018-07-11 10:12:04 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/runspider.py", line 88, in run
self.crawler_process.crawl(spidercls, **opts.spargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 171, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 175, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1274, in unwindGenerator
return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 80, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 105, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 36, in from_settings
mw = mwcls.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/downloadermiddlewares/httpproxy.py", line 29, in from_crawler
return cls(auth_encoding)
File "/usr/local/lib/python2.7/dist-packages/scrapy/downloadermiddlewares/httpproxy.py", line 22, in __init__
self.proxies[type] = self._get_proxy(url, type)
File "/usr/local/lib/python2.7/dist-packages/scrapy/downloadermiddlewares/httpproxy.py", line 39, in _get_proxy
proxy_type, user, password, hostport = _parse_proxy(url)
File "/usr/lib/python2.7/urllib2.py", line 721, in _parse_proxy
raise ValueError("proxy URL with no authority: %r" % proxy)
exceptions.ValueError: proxy URL with no authority: '/var/run/docker.sock'
2018-07-11 10:12:04 [twisted] CRITICAL:
Looks like proxy code does not handle no_proxy
correctly.
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (7 by maintainers)
Top Results From Across the Web
How to Solve Proxy Error Codes - The Ultimate Guide!
A proxy error is an HTTP error status that you will receive as a response when a request sent to the web server...
Read more >Proxy Error Meaning - http Status Codes - Bright Data
A 401 error code means you are not authorized to access the target site, and that is why the page will not load....
Read more >JIRA GUI is not rendered properly when accessed via Proxy ...
JIRA GUI is not rendered properly when accessed via Proxy URL ... Failed to create requestUri // due to: Expected authority at index...
Read more >HTTP 407 proxy authentication error when calling a web service
This works fine if the proxy server is turned off, or the URL has been whitelisted as not requiring authentication, but as soon...
Read more >Troubleshooting Web Application Proxy | Microsoft Learn
The admin must make sure that no one binds to the same URLs. To check this, run the command: netsh http show urlacl....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I guess NO_PROXY handling is very open to specific interpretations and is not standardized. Docker client describes the uses of NO_PROXY for its purposes here while scrapy can just ignore the proxy that the
urllib2.parse_proxy
fails to parse.I’m not sure what could be a solution to this issue.
/var/run/docker.sock
seems the only exception for having a socket file in ano_proxy
env variable. So either the socket file be ignored and not added to the list of proxies or maybe add it without passing the socket file path to_get_proxy
method which is causing the error?But the second option will cause further errors when the proxy is parsed in other modules. So I think ignoring the socket file will be the best option? Also I can’t find any other cases where a socket file is used in
no_proxy
. Thoughts?