question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is noconnect really deprecated? https request over https proxy times out. http request over https proxy works as expected.

See original GitHub issue

Description

Possibly similar to #5286?

Following Curl commands:

$ curl --proxy 'https://user:pass@sub.domain.com:port' https://ipinfo.io/ip # https over https proxy
# yields proxy_ip
$ curl --proxy 'https://user:pass@sub.domain.com:port' http://ipinfo.io/ip # http over https proxy
# yields proxy_ip
$ curl http://ipinfo.io/ip # http
# yields home_ip
$ curl https://ipinfo.io/ip # https
# yields home_ip

Using Scrapy Shell

fetch(scrapy.Request('https://ipinfo.io/ip', meta={'download_timeout': 5, 'proxy': 'https://user:pass@sub.domain.com:port'})) # https over https proxy
# Times out or hangs indefinitely if no timeout specified
fetch(scrapy.Request('http://ipinfo.io/ip', meta={'download_timeout': 5, 'proxy': 'https://user:pass@sub.domain.com:port'})) # http over https proxy
# yields proxy_ip
fetch(scrapy.Request('http://ipinfo.io/ip', meta={'download_timeout': 5})) # http
# yields home_ip
fetch(scrapy.Request('https://ipinfo.io/ip', meta={'download_timeout': 5})) # https
# yields home_ip

Steps to Reproduce

  1. fetch https request using https proxy

Expected behavior: response received

Actual behavior: scrapy hangs indefinitely or times out if timeout specified

Reproduces how often: 100%

Versions

Please paste here the output of executing scrapy version --verbose in the command line.

Scrapy       : 2.4.1
lxml         : 4.7.1.0
libxml2      : 2.9.12
cssselect    : 1.1.0
parsel       : 1.6.0
w3lib        : 1.22.0
Twisted      : 21.7.0
Python       : 3.10.2 | packaged by conda-forge | (main, Feb  1 2022, 19:29:00) [GCC 9.4.0]
pyOpenSSL    : 22.0.0 (OpenSSL 1.1.1f  31 Mar 2020)
cryptography : 36.0.0
Platform     : Linux-5.4.0-100-generic-x86_64-with-glibc2.31

Additional context

Conda environment.yml

channels:
  - conda-forge
  - defaults
dependencies:
  - python >=3.9.0
  - scrapy
  - brotlipy
  - zstandard
  - rich 
  - genanki
  - imagemagick
  - apsw
  - pip
  - pip:
    - m3u8
    - switch

`conda env export` output
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - appdirs=1.4.4=pyh9f0ad1d_0
  - apsw=3.37.0.r1=py310h4988143_0
  - atk-1.0=2.36.0=h3371d22_4
  - attrs=21.4.0=pyhd8ed1ab_0
  - automat=20.2.0=py_0
  - bcrypt=3.2.0=py310h6acc77f_2
  - brotlipy=0.7.0=py310h6acc77f_1003
  - bzip2=1.0.8=h7f98852_4
  - ca-certificates=2021.10.8=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - cairo=1.16.0=ha00ac49_1009
  - cffi=1.15.0=py310h0fdd8cc_0
  - chevron=0.14.0=pyhd3deb0d_1
  - colorama=0.4.4=pyh9f0ad1d_0
  - commonmark=0.9.1=py_0
  - constantly=15.1.0=py_0
  - cryptography=36.0.0=py310h9ce1e76_0
  - cssselect=1.1.0=py_0
  - dataclasses=0.8=pyhc8e2a94_3
  - expat=2.4.4=h9c3ff4c_0
  - fftw=3.3.10=nompi_h77c792f_102
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=hab24e00_0
  - fontconfig=2.13.94=ha180cfb_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - freetype=2.10.4=h0708190_1
  - fribidi=1.0.10=h36c2ea0_0
  - frozendict=2.3.0=py310h6acc77f_1
  - future=0.18.2=py310hff52083_4
  - gdk-pixbuf=2.42.6=h04a7f16_0
  - genanki=0.13.0=pyhd8ed1ab_0
  - gettext=0.19.8.1=h73d1719_1008
  - ghostscript=9.54.0=h9c3ff4c_1
  - giflib=5.2.1=h36c2ea0_2
  - graphite2=1.3.13=h58526e2_1001
  - graphviz=2.50.0=h85b4f2f_1
  - gtk2=2.24.33=h539f30e_1
  - gts=0.7.6=h64030ff_2
  - harfbuzz=3.3.1=hb4a5f5f_0
  - hyperlink=21.0.0=pyhd3deb0d_0
  - icu=69.1=h9c3ff4c_0
  - idna=3.3=pyhd8ed1ab_0
  - imagemagick=7.1.0_23=pl5321hb118871_0
  - incremental=21.3.0=pyhd8ed1ab_0
  - itemadapter=0.4.0=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jpeg=9e=h7f98852_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - lerc=3.0=h9c3ff4c_0
  - libdeflate=1.8=h7f98852_0
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=11.2.0=h1d223b6_12
  - libgd=2.3.3=h3cfcdeb_1
  - libgfortran-ng=11.2.0=h69a702a_12
  - libgfortran5=11.2.0=h5c6108e_12
  - libglib=2.70.2=h174f98d_2
  - libgomp=11.2.0=h1d223b6_12
  - libiconv=1.16=h516909a_0
  - libnsl=2.0.0=h7f98852_0
  - libpng=1.6.37=h21135ba_2
  - librsvg=2.52.5=hc3c00ef_1
  - libstdcxx-ng=11.2.0=he4da1e4_12
  - libtiff=4.3.0=h6f004c6_2
  - libtool=2.4.6=h9c3ff4c_1008
  - libuuid=2.32.1=h7f98852_1000
  - libwebp=1.2.2=h3452ae3_0
  - libwebp-base=1.2.2=h7f98852_1
  - libxcb=1.13=h7f98852_1004
  - libxml2=2.9.12=h885dcf4_1
  - libxslt=1.1.33=h0ef7038_3
  - libzlib=1.2.11=h36c2ea0_1013
  - lxml=4.7.1=py310ha5446b1_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - ncurses=6.3=h9c3ff4c_0
  - openjpeg=2.4.0=hb52868f_1
  - openssl=3.0.0=h7f98852_2
  - pango=1.48.10=h54213e6_2
  - parsel=1.6.0=py_0
  - pcre=8.45=h9c3ff4c_0
  - perl=5.32.1=1_h7f98852_perl5
  - pip=22.0.3=pyhd8ed1ab_0
  - pixman=0.40.0=h36c2ea0_0
  - pkg-config=0.29.2=h36c2ea0_1008
  - pthread-stubs=0.4=h36c2ea0_1001
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pydispatcher=2.0.5=py_1
  - pygments=2.11.2=pyhd8ed1ab_0
  - pyopenssl=22.0.0=pyhd8ed1ab_0
  - python=3.10.2=hc74c709_3_cpython
  - python_abi=3.10=2_cp310
  - pyyaml=6.0=py310h6acc77f_3
  - queuelib=1.6.2=pyhd8ed1ab_0
  - readline=8.1=h46c0cb4_0
  - rich=11.1.0=pyhd8ed1ab_0
  - scrapy=2.4.1=py310h06a4308_0
  - service_identity=18.1.0=py_0
  - setuptools=60.7.1=py310hff52083_0
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.37.0=h9cd32fc_0
  - tk=8.6.11=h27826a3_1
  - twisted=21.7.0=py310h6acc77f_1
  - typing-extensions=4.0.1=hd8ed1ab_0
  - typing_extensions=4.0.1=pyha770c72_0
  - tzdata=2021e=he74cb21_0
  - w3lib=1.22.0=pyh9f0ad1d_0
  - wheel=0.37.1=pyhd8ed1ab_0
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libice=1.0.10=h7f98852_0
  - xorg-libsm=1.2.3=hd9c2040_1000
  - xorg-libx11=1.7.2=h7f98852_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h7f98852_1
  - xorg-libxrender=0.9.10=h7f98852_1003
  - xorg-libxt=1.2.1=h7f98852_2
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h7f98852_1002
  - xorg-xproto=7.0.31=h7f98852_1007
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h7f98852_2
  - zlib=1.2.11=h36c2ea0_1013
  - zope.interface=5.4.0=py310h6acc77f_1
  - zstandard=0.17.0=py310h6acc77f_0
  - zstd=1.5.2=ha95c52a_0
  - pip:
    - iso8601==1.0.2
    - m3u8==1.0.0
    - switch==1.1.0

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
Gallaeciocommented, Mar 8, 2022

possibly some proxy providers are moving to only support new tls standard?

Sounds like it. Let us close this issue and track work on TLS 1.3 support instead. Thank you for your feedback!!!

Duplicate of #4821.

1reaction
Gallaeciocommented, Mar 2, 2022

I can only say that Scrapy does the same thing when you use http:// in the proxy URL and when you use ?noconnect instead: https://github.com/scrapy/scrapy/blob/c316ca45a5b1b19622c96049c9378d8c45adba60/scrapy/core/downloader/handlers/http11.py#L292-L314

If you use https://, it goes the CONNECT way. ?noconnect allows to keep the HTTPS protocol in the client-proxy connection while not using the CONNECT approach, so I am starting to think it makes sense to keep that option around, i.e. undeprecate it, given as it seems that there are proxies out there that only support that.

But I would like to know whether or not it is normal for the no-CONNECT HTTPS option to be supported by proxies, and if there is a way that this could be elegantly negotiated between Scrapy and the proxy, so that things work as expected without users needing to append ?noconnect to the proxy URL.

On the other hand, maybe the ?noconnect approach is less secure (maybe it exposes all data to the proxy server?), in which case it is better to force users to include this in the URL, and to cover in the documentation about this option any security information users need to know when using this type of proxy connection.

Read more comments on GitHub >

github_iconTop Results From Across the Web

HTTPS connections over proxy servers - Stack Overflow
TLS/SSL (The S in HTTPS) guarantees that there are no eavesdroppers between you and the server you are contacting, i.e. no proxies.
Read more >
CONNECT - HTTP - MDN Web Docs
The client asks an HTTP Proxy server to tunnel the TCP connection to the desired destination. The server then proceeds to make the...
Read more >
Change: the logging - Nginx.org
*) Bugfix: socket leak when using HTTP/2. *) Bugfix: a timeout might occur while handling pipelined requests in an SSL connection; the bug...
Read more >
Go Modules Reference - The Go Programming Language
To https://proxy.golang.org/ , if all requests to https://corp.example.com/ have failed with 404 or 410: Request for latest version of golang.org/x/net/html ...
Read more >
GNU Wget 1.21.1-dirty Manual
It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP ... Wget has been designed for robustness over slow or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found