question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Workflow of modifying CPython in Pyodide

See original GitHub issue

Related to a recent discussion with @antocuni we should probably document better the recommended workflow for working on a modifed CPython as part of Pyodide. To some extent, this workflow also applies to other patched packages.

By default, the build system would indeed download the tar.gz, extract to cpython/build/ and apply patches.

Potentially one could just modify files in this build directory and re-run make. It should not overwrite changes unless one runs make -C cpython clean. However, the problem is then it’s difficult to create patches or generally track changes.

Another solution that we have been using more lately, is to

  1. clone the CPython repo via your Github fork
  2. checkout the version we use now (3.10.2)
  3. create a new branch and apply all patches in cpython/patches on this branch.
  4. Then work in this branch, which has the advantage that new commits can be easily exported as patches as documented here

To make things faster for development and avoid exporting patches each time, one can likely symlink the Cpython working tree to pyodide/cpython/build though, one needs to run,

touch pyodide/cpython/build/.patched

so it wouldn’t be overwritten. I think it should work, but personally, I have not tried this last part.

Maybe @hoodmane who has worked on updating CPython more recently has also some thoughts on this workflow.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
hoodmanecommented, Jun 15, 2022

I was more thinking of a higher level overview of what’s happening: I suppose that somehow the process is roughly …

An overview of make -C cpython is as follows:

  1. Clone and patch cpython. Also clone zlib, bz2, sqlite. Clone my fork of libffi
  2. Build libzlib.a, libbz2.a, libsqlite3.a, libffi.a
  3. Configure and build libpython3.10.a
  4. install libraries into installs/python-3.10.2/lib/, and Python stdlib into installs/python-3.10.2/lib/python3.10
  5. Make the emscripten sysconfigdata and install it into installs/python-3.10.2/sysconfigdata (for the build system) and installs/python-3.10.2/lib/python3.10 (for use from Pyodide)

These archive files contain wasm object files. I think the wasm object file format is standard llvm object files with the wasm instruction set. Some object files contain JavaScript functions. The JavaScript function definitions are stored as strings in a custom “js” data segment, and a custom “linking” section contains the offsets into the data segment.

Then when you run make in the root directory we do the following:

  1. The typescript file src/core/pyproxy.ts uses C macros so we run it through the C preprocessor and store the output to src/js/pyproxy.gen.ts. Also src/core/error_handling.ts is copied to src/js/error_handling.gen.ts.
  2. The typescript files in src/js other than pyodide.ts are rolled up into src/js/_pyodide.out.js. The loader file src/js/pyodide.ts is rolled into pyodide.js and pyodide.mjs.
  3. The C files in src/core are compiled to object files
  4. We link the wasm module, producing dist/pyodide.asm.js, dist/pyodide.asm.wasm, and dist/pyodide.asm.data.
  5. We do some postprocessing that strips a few things out of the pyodide.asm.js module and adds "use strict", etc.
  6. We make the typescript type definitions dist/pyodide.d.ts
  7. We tar the Pyodide python modules src/py/pyodide and src/py/_pyodide and store this as dist/pyodide_py.tar
  8. We build the CPython test modules, then tar the Python standard lib tests and put the result in dist/test.tar.
  9. We build the appropriate subset of the packages .

The linker stage is the most complicated “magic happens” part, but luckily the magic happens inside of emcc. emcc analyzes all of the object files and the settings and decides what sort of host libraries to build. The host libraries are build in a mixture of C, C++, and a special Emscripten dialect of JavaScript. It generates a linker invocation to llvm-ld which produces pyodide.asm.wasm, then it runs the binaryen optimizer over pyodide.asm.wasm. It looks up all the JavaScript in the object files and pulls it out. It puts all the JavaScript libraries, the wasm bootstrap system, the JavaScript functions, etc into pyodide.asm.js. We use --pre-js to inject _pyodide.out.js into it. We also use --preload-file to tell it to initialize the file system with the Python stdlib. The Python stdlib goes into dist/pyodide.asm.data.

0reactions
hoodmanecommented, Jun 15, 2022

I think that the plan is to apply all the patches upstream to be able to use a vanilla CPython anyway, at some point?

Yeah all 12 of the current patches have been upstreamed into Python 3.11. I think we may want to add a new patch for #2142 but I think that will be easy to upstream when we get it working.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to correctly pre-install and load packages in a modified ...
I have obtained a pyodide release and serve it from my own server. That release contains console.html for showing the Python REPL in...
Read more >
Roadmap — Version 0.21.3 - Pyodide
Our goal is to develop and document a better workflow for users to develop Python code for use in Pyodide. See issue #1940....
Read more >
Pyodide Documentation - Read the Docs
This document describes how to use pyodide to execute python scripts ... you may modify the webworker.js source to load some specific ...
Read more >
Replicating a Python workflow for pre-processing of an image ...
Am I using Pyodide correctly in this case? I keep getting syntax errors when trying to execute this live. Or is there an...
Read more >
Changelog — Python 3.11.1 documentation
TypeIgnore when changing line numbers. gh-99418: Fix bug in urllib.parse.urlparse() that causes URL schemes that begin with a digit, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found