BUG: SSL handshake error with Python 3.10 and Pandas read_csv for URLs
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
url = ("https://iridl.ldeo.columbia.edu/"
"SOURCES/.UCSB/.CHIRPS/.v2p0/.monthly/"
".global/.T/last/subgrid/0./add/T/"
"table%3A/1/%3Atable/.csv")
pd.read_csv(url)
Issue Description
With Python 3.10, reading the CHIRPS rainfall data csv file from the URL in the provided example fails with the following error:
Traceback (most recent call last):
File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/usr/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/usr/lib/python3.10/http/client.py", line 1454, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/usr/lib/python3.10/ssl.py", line 512, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib/python3.10/ssl.py", line 1070, in _create
self.do_handshake()
File "/usr/lib/python3.10/ssl.py", line 1341, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/turnerm/sync/pa-aa-toolbox/run_chirps.py", line 21, in <module>
df = pd.read_csv(url)
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 317, in wrapper
return func(*args, **kwargs)
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 927, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 582, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1421, in __init__
self._engine = self._make_engine(f, self.engine)
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1707, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 672, in get_handle
ioargs = _get_filepath_or_buffer(
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 336, in _get_filepath_or_buffer
with urlopen(req_info) as req:
File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 239, in urlopen
return urllib.request.urlopen(*args, **kwargs)
File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)>
This error is not present in Python 3.6-3.9. I suspect it is due to the increased security for default TLS settings in Python 3.10. A workaround I found based on this SO post:
import ssl
from urllib.request import urlopen
import pandas as pd
url = ("https://iridl.ldeo.columbia.edu/"
"SOURCES/.UCSB/.CHIRPS/.v2p0/.monthly/"
".global/.T/last/subgrid/0./add/T/"
"table%3A/1/%3Atable/.csv")
context=ssl.create_default_context()
context.set_ciphers("DEFAULT")
result = urlopen(url, context=context)
df = pd.read_csv(result)
Expected Behavior
The csv should be read correctly into a dataframe, and should look like:
Time
0 Apr 2022
(Note that this dataset is not completely static, the date may eventually change, but it should be of a similar format)
Installed Versions
INSTALLED VERSIONS
commit : 3bf2cb1b227c80461c7a736718ae17e35d6d5772 python : 3.10.4.final.0 python-bits : 64 OS : Linux OS-release : 5.13.0-41-generic Version : #46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.5.0.dev0+849.g3bf2cb1b2 numpy : 1.22.4 pytz : 2022.1 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.1.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.0.3 lxml.etree : 4.9.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.4.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : 2022.5.0 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : 0.8.9 xarray : 2022.3.0 xlrd : 2.0.1 xlwt : 1.3.0 zstandard : None
Issue Analytics
- State:
- Created a year ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
Me too 😃 The older machine on which I also get the error, also seems to support TLS 1.2.
I think the issue is related to the python/openssl installation - unfortunately, I don’t know what is wrong (I would assume it works when you upgrade to Ubuntu 22.04/Fedora 36). Pandas simply uses urllib (and fsspec) to open URLs. If you believe that this is not an issue with the python/openssl installation, please feel free to open an issue at urllib.
Thanks @turnerm for the report.
from that post…
I’m no security expert, but that can only be a bad thing?
I don’t think pandas should implement any workarounds that weaken security, so removing the bug label and labelling as won’t fix and closing candidate to see what others think.