Workflow of modifying CPython in Pyodide
See original GitHub issueRelated to a recent discussion with @antocuni we should probably document better the recommended workflow for working on a modifed CPython as part of Pyodide. To some extent, this workflow also applies to other patched packages.
By default, the build system would indeed download the tar.gz, extract to cpython/build/
and apply patches.
Potentially one could just modify files in this build directory and re-run make. It should not overwrite changes unless one runs make -C cpython clean
. However, the problem is then it’s difficult to create patches or generally track changes.
Another solution that we have been using more lately, is to
- clone the CPython repo via your Github fork
- checkout the version we use now (3.10.2)
- create a new branch and apply all patches in
cpython/patches
on this branch. - Then work in this branch, which has the advantage that new commits can be easily exported as patches as documented here
To make things faster for development and avoid exporting patches each time, one can likely symlink the Cpython working tree to pyodide/cpython/build
though, one needs to run,
touch pyodide/cpython/build/.patched
so it wouldn’t be overwritten. I think it should work, but personally, I have not tried this last part.
Maybe @hoodmane who has worked on updating CPython more recently has also some thoughts on this workflow.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
An overview of
make -C cpython
is as follows:installs/python-3.10.2/lib/
, and Python stdlib intoinstalls/python-3.10.2/lib/python3.10
installs/python-3.10.2/sysconfigdata
(for the build system) andinstalls/python-3.10.2/lib/python3.10
(for use from Pyodide)These archive files contain wasm object files. I think the wasm object file format is standard llvm object files with the wasm instruction set. Some object files contain JavaScript functions. The JavaScript function definitions are stored as strings in a custom “js” data segment, and a custom “linking” section contains the offsets into the data segment.
Then when you run
make
in the root directory we do the following:src/core/pyproxy.ts
uses C macros so we run it through the C preprocessor and store the output tosrc/js/pyproxy.gen.ts
. Alsosrc/core/error_handling.ts
is copied tosrc/js/error_handling.gen.ts
.src/js
other thanpyodide.ts
are rolled up intosrc/js/_pyodide.out.js
. The loader filesrc/js/pyodide.ts
is rolled intopyodide.js
andpyodide.mjs
.dist/pyodide.asm.js
,dist/pyodide.asm.wasm
, anddist/pyodide.asm.data
.pyodide.asm.js
module and adds"use strict"
, etc.dist/pyodide.d.ts
src/py/pyodide
andsrc/py/_pyodide
and store this asdist/pyodide_py.tar
dist/test.tar
.The linker stage is the most complicated “magic happens” part, but luckily the magic happens inside of
emcc
.emcc
analyzes all of the object files and the settings and decides what sort of host libraries to build. The host libraries are build in a mixture of C, C++, and a special Emscripten dialect of JavaScript. It generates a linker invocation tollvm-ld
which producespyodide.asm.wasm
, then it runs the binaryen optimizer overpyodide.asm.wasm
. It looks up all the JavaScript in the object files and pulls it out. It puts all the JavaScript libraries, the wasm bootstrap system, the JavaScript functions, etc intopyodide.asm.js
. We use--pre-js
to inject_pyodide.out.js
into it. We also use--preload-file
to tell it to initialize the file system with the Python stdlib. The Python stdlib goes intodist/pyodide.asm.data
.Yeah all 12 of the current patches have been upstreamed into Python 3.11. I think we may want to add a new patch for #2142 but I think that will be easy to upstream when we get it working.