Support inlining C and C++ (or even LLVM IR) code into nopython-jitted function/class
See original GitHub issueFeature request
Would be great if it was possible to inline regular C or C++ code into nopython-jitted function. Something like following:
@numba.njit
def f(a):
c_funcs = numba.c_func("""
inline int add(int a, int b) { return a + b; }
inline int mul(int a, int b) { return a * b; }
""")
b = 3
for i in range(5):
a = c_funcs.mul(c_funcs.add(a, a), b)
return a
Main idea here that C functions (add, mul) code should be inlined into f() and optimized by LLVM.
Of cause there is CFFI support that allows to compile any C functions as .pyd module and then use them inside njited-function. But drawback that these C functions are not inlined (they are called by address) into njited code hence not optmized by LLVM as a whole.
I think there should be some way to mix Numba Python’s code and C/C++ code directly, because not everything can be done in pure Python.
For example if I want to do multiplication of u64 x u64 -> u128 then there is no such single-instruction operation in Python and Numba, while in C/C++ it can be done by unsigned __int128 c = uint64_t(a) * uint64_t(b); in Clang or uint64_t hi, lo = _umul128(a, b, &hi); in MSVC. Which results in single Assembler mul instruction using several CPU cycles. In python you can’t do this as one CPU instruction.
Of course one can make array u128-multiplication C function using CFFI then non-inlined function call overhead will be small. But it is not always possible to act on whole array - for example I want to implement jitclass that emulates u128 and use this u128 class everywhere for single-value variables in some njitted mathematical code, where there is no work on array at all.
Another use-case is to implement jitclass that emulates BigInteger so that BigInteger (similar like python’s int) will be available in nopython-function. Of cause efficient single-value (non-array) BigInteger is not possible to implement without inlineable C/C++ functions.
Why C/C++ inlining is crucial? Because it often happens that Numba’s Python is lacking some operation and in C/C++ (or even Assembler) this operation can be done as 1-3 CPU instructions. Non-inlined function call that does 1-3 instructions will have too huge overhead.
Also as Numba is LLVM-based then would be great to also be possible to inline LLVM IR (LLVM Intermediate Representation). Or other kind of Assembler-like language. Because when Python code is jitted of cause it is converted to LLVM IR at some point, hence inlining one LLVM IR into another one looks like natural thing.
Inlining LLVM IR will allow anybody to inline any-language code. For example you don’t support Rust. But Rust developer can compile Rust to LLVM IR (I think Rust is supported by some fork (or official) of LLVM) and then inline this LLVM IR into your nopython-jitted code. Hence LLVM IR inlining will allow to support any possible language that is based on LLVM.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:9 (4 by maintainers)

Top Related StackOverflow Question
@polkovnikov here’s an example of how to do what’s in the OP, the other cases you have mentioned are simplifications of this. I hope at some point to extract some useful parts into Numba’s public extension API (the part about linking in some bitcode). The thing I’ve not sorted out yet in this example is the forcible inlining of the functions defined in the C source.
Correct, this example does do that. But in your case, you’d not access the function via ctypes, you’d just generate a call to it using an
@intrinsichttps://numba.readthedocs.io/en/stable/extending/high-level.html#implementing-intrinsics.This is understood, and I think possible.
Yes, this is why you need to compile the external source to bitcode/LLVM IR and add that module to the library that Numba is generating code into such that it can be all linked together and inlining/many other related optimisations take place.
I’ve got an example of how to do all this but have one more thing to work out prior to sharing it.
The conclusion from the Numba meeting was that it is probably not something Numba can support directly due to the complexity of ensuring valid compilers/LLVM IR versions/type system behaviours etc. However, some of the parts needed to actually implement this could well be abstracted as something that Numba could support, for example, linking in an external bitcode source.