question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`import ssl` fails → can't fetch HTTPS content

See original GitHub issue

Solution/Summary

  • Use js.fetch instead of e.g. urlretrieve (cf. https://github.com/pyodide/pyodide/issues/529)
  • Some functions, like pandas.read_html, use unsupported urllib functions under the hood.
    • In these cases, you’ll need to fetch the URL yourself, save it to a file, and point pd.read_html at that file
    • Example code
  • Some sites, like Wikipedia, don’t return the CORS headers necessary for fetch to work.
    • I’m not sure what to do about that, or whether it makes sense that e.g. GitHub provides the required CORS headers while Wikipedia doesn’t.
    • One workaround is to download the HTML yourself (either via a system Python install, wget, or “View Page Source” / “Save Page” in the browser), upload it to somewhere fetchable (like a GitHub repo or gist), and point your Jupyterlite notebook at that URL.

Description

import ssl fails in Jupyterlite (Pyolite kernel on the demo site):

image

This affects this block in http/client.py:

try:
    import ssl
except ImportError:
    pass
else:
    class HTTPSConnection(HTTPConnection):
        "This class allows communication via SSL."
        …

which causes HTTPSConnection to never be defined, so there’s no https protocol handler available.

This results in various attempts to fetch https URLs hitting URLError: <urlopen error unknown url type: https>:

  • urlretrieve
  • pandas.read_html

(Note that pandas.read_html requires import lxml before import pandas to even get this far, otherwise it fails with ImportError: lxml not found, please install it before even attempting to fetch the URL.)

Reproduce

Here is a notebook I created on the jupyterlite demo “server” that shows these end results of this issue: (urlretrieve and pd.read_html failing to fetch an https URL).

The full source is

import ssl          # ❌ fails: ModuleNotFoundError: No module named '_ssl'

from urllib.request import urlretrieve
url = 'https://en.wikipedia.org/wiki/Project_Jupyter'
urlretrieve(url)    # ❌ fails: `URLError: <urlopen error unknown url type: https>`

import lxml
import pandas as pd
pd.read_html(url)   # ❌ fails: `URLError: <urlopen error unknown url type: https>`

Screenshots

import ssl

image

urlretrieve

image

import lxml.etreepd.read_html

image

Continued:

image

Expected behavior

Expect urlretrieve/pd.read_html to be able to fetch HTTPS URLs.

Context

Browser Output
Dispose worker for kernel 0f962d82-1560-48a8-8dd4-d91b6e22bac6
react_devtools_backend.js:2540 Connection lost, reconnecting in 0 seconds.
overrideMethod @ react_devtools_backend.js:2540
_reconnect @ default.js:1245
reconnect @ default.js:526
restart @ default.js:496
async function (async)
restart @ default.js:493
restartKernel @ sessioncontext.js:319
restart @ sessioncontext.js:788
async function (async)
restart @ sessioncontext.js:763
restartKernel @ index.js:1972
(anonymous) @ index.js:559
e.execute @ index.es6.js:371
e._executeKeyBinding @ index.es6.js:531
e.processKeydownEvent @ index.es6.js:470
e.evtKeydown @ index.es6.js:356
e.handleEvent @ index.es6.js:312
pyodide.asm.js:14 Python initialization complete
load-pyodide.js:172 Loading distutils
load-pyodide.js:198 Loading distutils from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/distutils.js
load-pyodide.js:252 Loaded distutils
load-pyodide.js:172 Loading micropip, pyparsing, packaging, distutils
load-pyodide.js:198 Loading micropip from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/micropip.js
load-pyodide.js:198 Loading pyparsing from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pyparsing.js
load-pyodide.js:198 Loading packaging from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/packaging.js
load-pyodide.js:185 distutils already loaded from default channel
load-pyodide.js:252 Loaded micropip, pyparsing, packaging, distutils
load-pyodide.js:172 Loading matplotlib, distutils, cycler, six, kiwisolver, numpy, pillow, pyparsing, python-dateutil, pytz
load-pyodide.js:198 Loading matplotlib from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/matplotlib.js
load-pyodide.js:185 distutils already loaded from default channel
load-pyodide.js:198 Loading cycler from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/cycler.js
load-pyodide.js:198 Loading six from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/six.js
load-pyodide.js:198 Loading kiwisolver from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/kiwisolver.js
load-pyodide.js:198 Loading numpy from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/numpy.js
load-pyodide.js:198 Loading pillow from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pillow.js
load-pyodide.js:185 pyparsing already loaded from default channel
load-pyodide.js:198 Loading python-dateutil from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/python-dateutil.js
load-pyodide.js:198 Loading pytz from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pytz.js
load-pyodide.js:252 Loaded matplotlib, distutils, cycler, six, kiwisolver, numpy, pillow, pyparsing, python-dateutil, pytz
load-pyodide.js:172 Loading jedi, parso, pygments, decorator, setuptools, distutils, pyparsing
load-pyodide.js:198 Loading jedi from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/jedi.js
load-pyodide.js:198 Loading parso from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/parso.js
load-pyodide.js:198 Loading pygments from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/Pygments.js
load-pyodide.js:198 Loading decorator from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/decorator.js
load-pyodide.js:198 Loading setuptools from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/setuptools.js
load-pyodide.js:185 distutils already loaded from default channel
load-pyodide.js:185 pyparsing already loaded from default channel
load-pyodide.js:252 Loaded jedi, parso, pygments, decorator, setuptools, distutils, pyparsing
7f57e2c3-8b14-444f-ab6f-bc93221bdc02:58 Pyolite kernel initialized, version 0.1.0a16
7f57e2c3-8b14-444f-ab6f-bc93221bdc02:320 Perform execution inside worker {type: 'execute-request', data: {…}, parent: {…}}
7f57e2c3-8b14-444f-ab6f-bc93221bdc02:320 Perform execution inside worker {type: 'execute-request', data: {…}, parent: {…}}
7f57e2c3-8b14-444f-ab6f-bc93221bdc02:320 Perform execution inside worker {type: 'execute-request', data: {…}, parent: {…}}
load-pyodide.js:172 Loading lxml, beautifulsoup4, soupsieve, cssselect, html5lib, webencodings, six, pandas, distutils, numpy, python-dateutil, pytz
load-pyodide.js:198 Loading lxml from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/lxml.js
load-pyodide.js:198 Loading beautifulsoup4 from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/beautifulsoup4.js
load-pyodide.js:198 Loading soupsieve from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/soupsieve.js
load-pyodide.js:198 Loading cssselect from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/cssselect.js
load-pyodide.js:198 Loading html5lib from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/html5lib.js
load-pyodide.js:198 Loading webencodings from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/webencodings.js
load-pyodide.js:185 six already loaded from default channel
load-pyodide.js:198 Loading pandas from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pandas.js
load-pyodide.js:185 distutils already loaded from default channel
load-pyodide.js:185 numpy already loaded from default channel
load-pyodide.js:185 python-dateutil already loaded from default channel
load-pyodide.js:185 pytz already loaded from default channel
load-pyodide.js:252 Loaded lxml, beautifulsoup4, soupsieve, cssselect, html5lib, webencodings, six, pandas, distutils, numpy, python-dateutil, pytz

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
bollwyvlcommented, Nov 8, 2021

Woof, yeah: there’s a couple key parts of stdlib that just aren’t going to work without substantial patching upstream… Potentially even to import. We kind of have our hands full getting the jupyter/ipython layer of stuff working, and are trying to stay out of lower level this when we possibly can.

In addition to networking, I’d imagine anything related to displays, processes, pipes, resources will be “off” and things that sniff platform/implementation-specific stuff will likely be wrong.

The examples posted (and those on the documentation and demos) are the best we have for now, but would be happy to accept more content for organizing/surfacing these kinds of knowledge. For example, we’ve discussed adding message filtering and reporting (which we can do at the any-kernel-message level) and could show links in the status bar etc. when certain errors are encountered. This feature wouldn’t even be stuff to lite, but we’d have our own set of “gotchas,” as would other harsh environments like “windows”.

Good luck!

1reaction
jtpiocommented, Nov 8, 2021

Thanks @ryan-williams for the detailed report 👍

This is probably more of a limitation of Pyodide.

An alternative is to use the browser fetch, for example:

from js import fetch

URL = "https://eu.httpbin.org"

res = await fetch(URL)
text = await res.text()
print(text)

image

Read more comments on GitHub >

github_iconTop Results From Across the Web

pip install fails with "connection error: [SSL - Stack Overflow
E.g. you can go to failing URL from web-browser and import root certificate into your system. Run python -c "import ssl; print(ssl.
Read more >
My SSL is installed, why do I get the warning 'Not secure' in ...
The issue with insecure content is that the browser tries to load all of your website content via a secure HTTPS connection, whether...
Read more >
A Simple Explanation of SSL Certificate Errors & How to Fix ...
This error indicates that the SSL certificate is signed or approved by a company that the browser does not trust. That means either...
Read more >
Exploring HTTPS With Python - Real Python
The method for static content is typically GET , though there are others available, like POST , HEAD , and ... Again, HTTPS...
Read more >
SSL certificate problem: Unable to get local issuer certificate
Problem. The following is seen on the command line when pushing or pulling: SSL Certificate problem: unable to get local issuer ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found