question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: SSL handshake error with Python 3.10 and Pandas read_csv for URLs

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
url = ("https://iridl.ldeo.columbia.edu/"
       "SOURCES/.UCSB/.CHIRPS/.v2p0/.monthly/"
       ".global/.T/last/subgrid/0./add/T/"
       "table%3A/1/%3Atable/.csv")
pd.read_csv(url)

Issue Description

With Python 3.10, reading the CHIRPS rainfall data csv file from the URL in the provided example fails with the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/lib/python3.10/http/client.py", line 1454, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib/python3.10/ssl.py", line 512, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1070, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1341, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/turnerm/sync/pa-aa-toolbox/run_chirps.py", line 21, in <module>
    df = pd.read_csv(url)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 317, in wrapper
    return func(*args, **kwargs)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 927, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 582, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1421, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1707, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 672, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 336, in _get_filepath_or_buffer
    with urlopen(req_info) as req:
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 239, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)>

This error is not present in Python 3.6-3.9. I suspect it is due to the increased security for default TLS settings in Python 3.10. A workaround I found based on this SO post:

import ssl
from urllib.request import urlopen

import pandas as pd

url = ("https://iridl.ldeo.columbia.edu/"
       "SOURCES/.UCSB/.CHIRPS/.v2p0/.monthly/"
       ".global/.T/last/subgrid/0./add/T/"
       "table%3A/1/%3Atable/.csv")

context=ssl.create_default_context()
context.set_ciphers("DEFAULT")
result = urlopen(url, context=context)
df = pd.read_csv(result)

Expected Behavior

The csv should be read correctly into a dataframe, and should look like:

       Time
0  Apr 2022

(Note that this dataset is not completely static, the date may eventually change, but it should be of a similar format)

Installed Versions

INSTALLED VERSIONS

commit : 3bf2cb1b227c80461c7a736718ae17e35d6d5772 python : 3.10.4.final.0 python-bits : 64 OS : Linux OS-release : 5.13.0-41-generic Version : #46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.5.0.dev0+849.g3bf2cb1b2 numpy : 1.22.4 pytz : 2022.1 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.1.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.0.3 lxml.etree : 4.9.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.4.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : 2022.5.0 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : 0.8.9 xarray : 2022.3.0 xlrd : 2.0.1 xlwt : 1.3.0 zstandard : None

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
twoertweincommented, Jun 2, 2022

I’m a bit out of my depth at this point

Me too 😃 The older machine on which I also get the error, also seems to support TLS 1.2.

I think the issue is related to the python/openssl installation - unfortunately, I don’t know what is wrong (I would assume it works when you upgrade to Ubuntu 22.04/Fedora 36). Pandas simply uses urllib (and fsspec) to open URLs. If you believe that this is not an issue with the python/openssl installation, please feel free to open an issue at urllib.

1reaction
simonjayhawkinscommented, Jun 2, 2022

Thanks @turnerm for the report.

A workaround I found based on this SO post:

from that post…

To make connections possible again it is necessary to use weaker security settings.

I’m no security expert, but that can only be a bad thing?

Expected Behavior

The csv should be read correctly into a dataframe, and should look like:

I don’t think pandas should implement any workarounds that weaken security, so removing the bug label and labelling as won’t fix and closing candidate to see what others think.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas read_csv from web acting different between Python ...
It works with 3.8, but appears to fail with 3.10. I am using the call: w1 = pd.read_csv("https://www.ncei.noaa.gov/data/local ...
Read more >
`OverflowError: signed integer is greater than maximum` in ssl ...
When attempting to read a large file (> 2GB) over HTTPS the read fails with "OverflowError: signed integer is greater than maximum". This...
Read more >
SSL Handshake with Mongo atlas fails with Python 3.10
The solution to this problem seems to be either use a server version >= 4.2, use a Python version <= 3.9 or use...
Read more >
urllib.error.urlerror: <urlopen error [ssl: certificate_verify_failed ...
It turns out that the target resource has a misconfigured certificate. Since there is no way to disable certificate verification in RDFLib -...
Read more >
postgresql (psycopg2.errors.duplicatetable) relation ...
Python 3.10 ; geopandas 0.10.2; Postgresql 14.2.2. Code: import pandas as pd import geopandas as gpd from sqlalchemy import create_engine x = pd.read_csv("....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found