`import ssl` fails → can't fetch HTTPS content
See original GitHub issueSolution/Summary
- Use
js.fetch
instead of e.g.urlretrieve
(cf. https://github.com/pyodide/pyodide/issues/529) - Some functions, like
pandas.read_html
, use unsupportedurllib
functions under the hood.- In these cases, you’ll need to
fetch
the URL yourself, save it to a file, and pointpd.read_html
at that file - Example code
- In these cases, you’ll need to
- Some sites, like Wikipedia, don’t return the CORS headers necessary for
fetch
to work.- I’m not sure what to do about that, or whether it makes sense that e.g. GitHub provides the required CORS headers while Wikipedia doesn’t.
- One workaround is to download the HTML yourself (either via a system Python install,
wget
, or “View Page Source” / “Save Page” in the browser), upload it to somewherefetch
able (like a GitHub repo or gist), and point your Jupyterlite notebook at that URL.
Description
import ssl
fails in Jupyterlite (Pyolite kernel on the demo site):
This affects this block in http/client.py
:
try:
import ssl
except ImportError:
pass
else:
class HTTPSConnection(HTTPConnection):
"This class allows communication via SSL."
…
which causes HTTPSConnection
to never be defined, so there’s no https
protocol handler available.
This results in various attempts to fetch https
URLs hitting URLError: <urlopen error unknown url type: https>
:
urlretrieve
pandas.read_html
(Note that pandas.read_html
requires import lxml
before import pandas
to even get this far, otherwise it fails with ImportError: lxml not found, please install it
before even attempting to fetch the URL.)
Reproduce
Here is a notebook I created on the jupyterlite demo “server” that shows these end results of this issue: (urlretrieve
and pd.read_html
failing to fetch an https
URL).
The full source is
import ssl # ❌ fails: ModuleNotFoundError: No module named '_ssl'
from urllib.request import urlretrieve
url = 'https://en.wikipedia.org/wiki/Project_Jupyter'
urlretrieve(url) # ❌ fails: `URLError: <urlopen error unknown url type: https>`
import lxml
import pandas as pd
pd.read_html(url) # ❌ fails: `URLError: <urlopen error unknown url type: https>`
Screenshots
import ssl
urlretrieve
import lxml.etree
→ pd.read_html
Continued:
Expected behavior
Expect urlretrieve
/pd.read_html
to be able to fetch HTTPS URLs.
Context
- JupyterLite version: Version 0.1.0-alpha.16 (https://jupyterlite.readthedocs.io/en/latest/_static/lab/index.html)
- Operating System and version: macOS Big Sur 11.6
- Browser and version: Chrome, Version 95.0.4638.69 (Official Build) (arm64)
Browser Output
Dispose worker for kernel 0f962d82-1560-48a8-8dd4-d91b6e22bac6 react_devtools_backend.js:2540 Connection lost, reconnecting in 0 seconds. overrideMethod @ react_devtools_backend.js:2540 _reconnect @ default.js:1245 reconnect @ default.js:526 restart @ default.js:496 async function (async) restart @ default.js:493 restartKernel @ sessioncontext.js:319 restart @ sessioncontext.js:788 async function (async) restart @ sessioncontext.js:763 restartKernel @ index.js:1972 (anonymous) @ index.js:559 e.execute @ index.es6.js:371 e._executeKeyBinding @ index.es6.js:531 e.processKeydownEvent @ index.es6.js:470 e.evtKeydown @ index.es6.js:356 e.handleEvent @ index.es6.js:312 pyodide.asm.js:14 Python initialization complete load-pyodide.js:172 Loading distutils load-pyodide.js:198 Loading distutils from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/distutils.js load-pyodide.js:252 Loaded distutils load-pyodide.js:172 Loading micropip, pyparsing, packaging, distutils load-pyodide.js:198 Loading micropip from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/micropip.js load-pyodide.js:198 Loading pyparsing from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pyparsing.js load-pyodide.js:198 Loading packaging from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/packaging.js load-pyodide.js:185 distutils already loaded from default channel load-pyodide.js:252 Loaded micropip, pyparsing, packaging, distutils load-pyodide.js:172 Loading matplotlib, distutils, cycler, six, kiwisolver, numpy, pillow, pyparsing, python-dateutil, pytz load-pyodide.js:198 Loading matplotlib from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/matplotlib.js load-pyodide.js:185 distutils already loaded from default channel load-pyodide.js:198 Loading cycler from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/cycler.js load-pyodide.js:198 Loading six from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/six.js load-pyodide.js:198 Loading kiwisolver from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/kiwisolver.js load-pyodide.js:198 Loading numpy from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/numpy.js load-pyodide.js:198 Loading pillow from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pillow.js load-pyodide.js:185 pyparsing already loaded from default channel load-pyodide.js:198 Loading python-dateutil from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/python-dateutil.js load-pyodide.js:198 Loading pytz from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pytz.js load-pyodide.js:252 Loaded matplotlib, distutils, cycler, six, kiwisolver, numpy, pillow, pyparsing, python-dateutil, pytz load-pyodide.js:172 Loading jedi, parso, pygments, decorator, setuptools, distutils, pyparsing load-pyodide.js:198 Loading jedi from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/jedi.js load-pyodide.js:198 Loading parso from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/parso.js load-pyodide.js:198 Loading pygments from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/Pygments.js load-pyodide.js:198 Loading decorator from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/decorator.js load-pyodide.js:198 Loading setuptools from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/setuptools.js load-pyodide.js:185 distutils already loaded from default channel load-pyodide.js:185 pyparsing already loaded from default channel load-pyodide.js:252 Loaded jedi, parso, pygments, decorator, setuptools, distutils, pyparsing 7f57e2c3-8b14-444f-ab6f-bc93221bdc02:58 Pyolite kernel initialized, version 0.1.0a16 7f57e2c3-8b14-444f-ab6f-bc93221bdc02:320 Perform execution inside worker {type: 'execute-request', data: {…}, parent: {…}} 7f57e2c3-8b14-444f-ab6f-bc93221bdc02:320 Perform execution inside worker {type: 'execute-request', data: {…}, parent: {…}} 7f57e2c3-8b14-444f-ab6f-bc93221bdc02:320 Perform execution inside worker {type: 'execute-request', data: {…}, parent: {…}} load-pyodide.js:172 Loading lxml, beautifulsoup4, soupsieve, cssselect, html5lib, webencodings, six, pandas, distutils, numpy, python-dateutil, pytz load-pyodide.js:198 Loading lxml from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/lxml.js load-pyodide.js:198 Loading beautifulsoup4 from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/beautifulsoup4.js load-pyodide.js:198 Loading soupsieve from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/soupsieve.js load-pyodide.js:198 Loading cssselect from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/cssselect.js load-pyodide.js:198 Loading html5lib from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/html5lib.js load-pyodide.js:198 Loading webencodings from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/webencodings.js load-pyodide.js:185 six already loaded from default channel load-pyodide.js:198 Loading pandas from https://cdn.jsdelivr.net/pyodide/v0.18.1/full/pandas.js load-pyodide.js:185 distutils already loaded from default channel load-pyodide.js:185 numpy already loaded from default channel load-pyodide.js:185 python-dateutil already loaded from default channel load-pyodide.js:185 pytz already loaded from default channel load-pyodide.js:252 Loaded lxml, beautifulsoup4, soupsieve, cssselect, html5lib, webencodings, six, pandas, distutils, numpy, python-dateutil, pytz
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Woof, yeah: there’s a couple key parts of stdlib that just aren’t going to work without substantial patching upstream… Potentially even to import. We kind of have our hands full getting the jupyter/ipython layer of stuff working, and are trying to stay out of lower level this when we possibly can.
In addition to networking, I’d imagine anything related to displays, processes, pipes, resources will be “off” and things that sniff platform/implementation-specific stuff will likely be wrong.
The examples posted (and those on the documentation and demos) are the best we have for now, but would be happy to accept more content for organizing/surfacing these kinds of knowledge. For example, we’ve discussed adding message filtering and reporting (which we can do at the any-kernel-message level) and could show links in the status bar etc. when certain errors are encountered. This feature wouldn’t even be stuff to lite, but we’d have our own set of “gotchas,” as would other harsh environments like “windows”.
Good luck!
Thanks @ryan-williams for the detailed report 👍
This is probably more of a limitation of Pyodide.
An alternative is to use the browser
fetch
, for example: