Is noconnect really deprecated? https request over https proxy times out. http request over https proxy works as expected.
See original GitHub issueDescription
Possibly similar to #5286?
Following Curl commands:
$ curl --proxy 'https://user:pass@sub.domain.com:port' https://ipinfo.io/ip # https over https proxy
# yields proxy_ip
$ curl --proxy 'https://user:pass@sub.domain.com:port' http://ipinfo.io/ip # http over https proxy
# yields proxy_ip
$ curl http://ipinfo.io/ip # http
# yields home_ip
$ curl https://ipinfo.io/ip # https
# yields home_ip
Using Scrapy Shell
fetch(scrapy.Request('https://ipinfo.io/ip', meta={'download_timeout': 5, 'proxy': 'https://user:pass@sub.domain.com:port'})) # https over https proxy
# Times out or hangs indefinitely if no timeout specified
fetch(scrapy.Request('http://ipinfo.io/ip', meta={'download_timeout': 5, 'proxy': 'https://user:pass@sub.domain.com:port'})) # http over https proxy
# yields proxy_ip
fetch(scrapy.Request('http://ipinfo.io/ip', meta={'download_timeout': 5})) # http
# yields home_ip
fetch(scrapy.Request('https://ipinfo.io/ip', meta={'download_timeout': 5})) # https
# yields home_ip
Steps to Reproduce
- fetch https request using https proxy
Expected behavior: response received
Actual behavior: scrapy hangs indefinitely or times out if timeout specified
Reproduces how often: 100%
Versions
Please paste here the output of executing scrapy version --verbose
in the command line.
Scrapy : 2.4.1
lxml : 4.7.1.0
libxml2 : 2.9.12
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.7.0
Python : 3.10.2 | packaged by conda-forge | (main, Feb 1 2022, 19:29:00) [GCC 9.4.0]
pyOpenSSL : 22.0.0 (OpenSSL 1.1.1f 31 Mar 2020)
cryptography : 36.0.0
Platform : Linux-5.4.0-100-generic-x86_64-with-glibc2.31
Additional context
Conda environment.yml
channels:
- conda-forge
- defaults
dependencies:
- python >=3.9.0
- scrapy
- brotlipy
- zstandard
- rich
- genanki
- imagemagick
- apsw
- pip
- pip:
- m3u8
- switch
`conda env export` output
channels:
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=1_gnu
- appdirs=1.4.4=pyh9f0ad1d_0
- apsw=3.37.0.r1=py310h4988143_0
- atk-1.0=2.36.0=h3371d22_4
- attrs=21.4.0=pyhd8ed1ab_0
- automat=20.2.0=py_0
- bcrypt=3.2.0=py310h6acc77f_2
- brotlipy=0.7.0=py310h6acc77f_1003
- bzip2=1.0.8=h7f98852_4
- ca-certificates=2021.10.8=ha878542_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- cairo=1.16.0=ha00ac49_1009
- cffi=1.15.0=py310h0fdd8cc_0
- chevron=0.14.0=pyhd3deb0d_1
- colorama=0.4.4=pyh9f0ad1d_0
- commonmark=0.9.1=py_0
- constantly=15.1.0=py_0
- cryptography=36.0.0=py310h9ce1e76_0
- cssselect=1.1.0=py_0
- dataclasses=0.8=pyhc8e2a94_3
- expat=2.4.4=h9c3ff4c_0
- fftw=3.3.10=nompi_h77c792f_102
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
- font-ttf-inconsolata=3.000=h77eed37_0
- font-ttf-source-code-pro=2.038=h77eed37_0
- font-ttf-ubuntu=0.83=hab24e00_0
- fontconfig=2.13.94=ha180cfb_0
- fonts-conda-ecosystem=1=0
- fonts-conda-forge=1=0
- freetype=2.10.4=h0708190_1
- fribidi=1.0.10=h36c2ea0_0
- frozendict=2.3.0=py310h6acc77f_1
- future=0.18.2=py310hff52083_4
- gdk-pixbuf=2.42.6=h04a7f16_0
- genanki=0.13.0=pyhd8ed1ab_0
- gettext=0.19.8.1=h73d1719_1008
- ghostscript=9.54.0=h9c3ff4c_1
- giflib=5.2.1=h36c2ea0_2
- graphite2=1.3.13=h58526e2_1001
- graphviz=2.50.0=h85b4f2f_1
- gtk2=2.24.33=h539f30e_1
- gts=0.7.6=h64030ff_2
- harfbuzz=3.3.1=hb4a5f5f_0
- hyperlink=21.0.0=pyhd3deb0d_0
- icu=69.1=h9c3ff4c_0
- idna=3.3=pyhd8ed1ab_0
- imagemagick=7.1.0_23=pl5321hb118871_0
- incremental=21.3.0=pyhd8ed1ab_0
- itemadapter=0.4.0=pyhd8ed1ab_0
- jbig=2.1=h7f98852_2003
- jpeg=9e=h7f98852_0
- ld_impl_linux-64=2.36.1=hea4e1c9_2
- lerc=3.0=h9c3ff4c_0
- libdeflate=1.8=h7f98852_0
- libffi=3.4.2=h7f98852_5
- libgcc-ng=11.2.0=h1d223b6_12
- libgd=2.3.3=h3cfcdeb_1
- libgfortran-ng=11.2.0=h69a702a_12
- libgfortran5=11.2.0=h5c6108e_12
- libglib=2.70.2=h174f98d_2
- libgomp=11.2.0=h1d223b6_12
- libiconv=1.16=h516909a_0
- libnsl=2.0.0=h7f98852_0
- libpng=1.6.37=h21135ba_2
- librsvg=2.52.5=hc3c00ef_1
- libstdcxx-ng=11.2.0=he4da1e4_12
- libtiff=4.3.0=h6f004c6_2
- libtool=2.4.6=h9c3ff4c_1008
- libuuid=2.32.1=h7f98852_1000
- libwebp=1.2.2=h3452ae3_0
- libwebp-base=1.2.2=h7f98852_1
- libxcb=1.13=h7f98852_1004
- libxml2=2.9.12=h885dcf4_1
- libxslt=1.1.33=h0ef7038_3
- libzlib=1.2.11=h36c2ea0_1013
- lxml=4.7.1=py310ha5446b1_0
- lz4-c=1.9.3=h9c3ff4c_1
- ncurses=6.3=h9c3ff4c_0
- openjpeg=2.4.0=hb52868f_1
- openssl=3.0.0=h7f98852_2
- pango=1.48.10=h54213e6_2
- parsel=1.6.0=py_0
- pcre=8.45=h9c3ff4c_0
- perl=5.32.1=1_h7f98852_perl5
- pip=22.0.3=pyhd8ed1ab_0
- pixman=0.40.0=h36c2ea0_0
- pkg-config=0.29.2=h36c2ea0_1008
- pthread-stubs=0.4=h36c2ea0_1001
- pyasn1=0.4.8=py_0
- pyasn1-modules=0.2.7=py_0
- pycparser=2.21=pyhd8ed1ab_0
- pydispatcher=2.0.5=py_1
- pygments=2.11.2=pyhd8ed1ab_0
- pyopenssl=22.0.0=pyhd8ed1ab_0
- python=3.10.2=hc74c709_3_cpython
- python_abi=3.10=2_cp310
- pyyaml=6.0=py310h6acc77f_3
- queuelib=1.6.2=pyhd8ed1ab_0
- readline=8.1=h46c0cb4_0
- rich=11.1.0=pyhd8ed1ab_0
- scrapy=2.4.1=py310h06a4308_0
- service_identity=18.1.0=py_0
- setuptools=60.7.1=py310hff52083_0
- six=1.16.0=pyh6c4a22f_0
- sqlite=3.37.0=h9cd32fc_0
- tk=8.6.11=h27826a3_1
- twisted=21.7.0=py310h6acc77f_1
- typing-extensions=4.0.1=hd8ed1ab_0
- typing_extensions=4.0.1=pyha770c72_0
- tzdata=2021e=he74cb21_0
- w3lib=1.22.0=pyh9f0ad1d_0
- wheel=0.37.1=pyhd8ed1ab_0
- xorg-kbproto=1.0.7=h7f98852_1002
- xorg-libice=1.0.10=h7f98852_0
- xorg-libsm=1.2.3=hd9c2040_1000
- xorg-libx11=1.7.2=h7f98852_0
- xorg-libxau=1.0.9=h7f98852_0
- xorg-libxdmcp=1.1.3=h7f98852_0
- xorg-libxext=1.3.4=h7f98852_1
- xorg-libxrender=0.9.10=h7f98852_1003
- xorg-libxt=1.2.1=h7f98852_2
- xorg-renderproto=0.11.1=h7f98852_1002
- xorg-xextproto=7.3.0=h7f98852_1002
- xorg-xproto=7.0.31=h7f98852_1007
- xz=5.2.5=h516909a_1
- yaml=0.2.5=h7f98852_2
- zlib=1.2.11=h36c2ea0_1013
- zope.interface=5.4.0=py310h6acc77f_1
- zstandard=0.17.0=py310h6acc77f_0
- zstd=1.5.2=ha95c52a_0
- pip:
- iso8601==1.0.2
- m3u8==1.0.0
- switch==1.1.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:17 (17 by maintainers)
Top Results From Across the Web
HTTPS connections over proxy servers - Stack Overflow
TLS/SSL (The S in HTTPS) guarantees that there are no eavesdroppers between you and the server you are contacting, i.e. no proxies.
Read more >CONNECT - HTTP - MDN Web Docs
The client asks an HTTP Proxy server to tunnel the TCP connection to the desired destination. The server then proceeds to make the...
Read more >Change: the logging - Nginx.org
*) Bugfix: socket leak when using HTTP/2. *) Bugfix: a timeout might occur while handling pipelined requests in an SSL connection; the bug...
Read more >Go Modules Reference - The Go Programming Language
To https://proxy.golang.org/ , if all requests to https://corp.example.com/ have failed with 404 or 410: Request for latest version of golang.org/x/net/html ...
Read more >GNU Wget 1.21.1-dirty Manual
It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP ... Wget has been designed for robustness over slow or...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sounds like it. Let us close this issue and track work on TLS 1.3 support instead. Thank you for your feedback!!!
Duplicate of #4821.
I can only say that Scrapy does the same thing when you use
http://
in the proxy URL and when you use?noconnect
instead: https://github.com/scrapy/scrapy/blob/c316ca45a5b1b19622c96049c9378d8c45adba60/scrapy/core/downloader/handlers/http11.py#L292-L314If you use
https://
, it goes theCONNECT
way.?noconnect
allows to keep the HTTPS protocol in the client-proxy connection while not using theCONNECT
approach, so I am starting to think it makes sense to keep that option around, i.e. undeprecate it, given as it seems that there are proxies out there that only support that.But I would like to know whether or not it is normal for the no-CONNECT HTTPS option to be supported by proxies, and if there is a way that this could be elegantly negotiated between Scrapy and the proxy, so that things work as expected without users needing to append
?noconnect
to the proxy URL.On the other hand, maybe the
?noconnect
approach is less secure (maybe it exposes all data to the proxy server?), in which case it is better to force users to include this in the URL, and to cover in the documentation about this option any security information users need to know when using this type of proxy connection.