Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support arbitrary fixed-size integers

See original GitHub issue

Feature request

LLVM supports arbitrary (but fixed) bit-size integers, for example you can create 1234-bit integer. Clang also supports arbitrary size integers, just do using u1234 = unsigned _ExtInt(1234); to create 1234-bit integer type u1234.

Would be great if Numba can use this LLVM/Clang feature of defining any-size integers, so that you can write something like dtype = numba.ext_int(1234); value = dtype(789) inside @njit-ed function.

I understand that Numpy doesn’t support int128 or anything larger than int64, but still Numba can use any-sized int for non-Numpy arrays/structures, such as list/set/dict.

For example if I do a = []; a.append(numba.ext_int(1234)(789)) then Numba can convert it to C++ version like std::vector<_ExtInt(1234)> a; a.push_back(789); (BTW, am I right that you’re converting Python code to C++? or you’re using C as backand?).

Same for dict d = {numba.ext_int(123)(456) : numba.ext_int(789)(100)} can be converted to std::unordered_map<_ExtInt(123), _ExtInt(789)> d; d[456] = 100;. Similarly for set().

So to conclude - for Numpy you can disallow _ExtInt(), but for list/dict/set you can allow _ExtInt().

Implementing such arbitrary-sized integer types as a consequence will result in a long-awaited feature int128, of course not within Numpy arrays but at least in list/dict/set.

Also maybe you can suggest some temporary work-around? I expect that Numba can have some low-level support for creating any type, for example through @intrinsic or IRBuilder or other ways of injecting LLVM IR assembly. Maybe Numba gurus can give ready-made example of implementing such feature as _ExtInt() support within current capabilities of Numba at least as @intrinsic?

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

stuartarchibaldcommented, Oct 1, 2021

@stuartarchibald Strange that udiv is not implemented for all CPUs. It is not that difficult to implement school-level simple algorithm of division. CLang could emit some warning that division performance was reduced due to absense of built-in CPU instruction, but I think shouldn’t break compilation.

I just found this issue 10 minutes ago, when compiling my C++ program. I used type _ExtInt(512) and did division and instead of throwing error or unresolved symbol compiler just crashed with crash dump, and at the beginning of crash stack there was an assert with message "UDiv not implemented!". I’m on laptop with x86_64 CPU.

This is indeed what happens.

Regarding @jitclass - strange that methods can’t be implemented as @intrinsic, looks like something that will be good to have in future. Regular functions can be intrinsics, so why not methods. Intrinisics will allow methods call to be inlined and optimized together with outer code that uses them. If I implement integer type through jitclass then having intrinsic methods is crucial for speed because tiny operations like add/sub will have too huge overhead of extra CALL instruction through methods.

@intrinsics generate LLVM IR directly into the current block, so they cannot be callable functions, they are conceptually different as their purpose is different.

@jitclass just uses the same technology as the rest of the compiler, its methods are essentially just @njit functions, LLVM will likely just inline them. I’d encourage writing actual use cases and timing performance etc before worrying about the impact of code generation.

LLVM makes decisions about optimisation/inlining based on its analysis of the generated IR.

For example if I implement my own u64 through jitclass then add/sub may take 1 cpu cycle and if these methods are not inlined in outer code as intrinsics do, then extra CALL will introduce several more cpu cycles. Not to say that inlining of intrinsics not only saves from CALL but also allows other optimizations to happen. For example if I explicitly zero upper half of integer then optimizer may figure out that it should do less add/mul operations/instructions than needed in general case. Or if in outer code integer appears within one register then there is no need to copy it to another register as when doing CALL.

LLVM will likely just figure out it can be inlined. Example:

from numba import njit, types
from numba.experimental import jitclass
import numpy as np

spec = {'_number': types.float64}

@jitclass(spec=spec)
class FloatNumber():
    def __init__(self, the_number):
        self._number = the_number

    def add(self, x):
        """Adds self._number to x and returns the value"""
        return self._number + x

@njit
def foo(n, FloatNumberInst):
    acc = 0
    for i in range(n):
        acc += FloatNumberInst.add(i)
    return acc

FloatNumberInst = FloatNumber(1.)

print(foo(10, FloatNumberInst))

print(foo.inspect_asm(foo.signatures[0]))

The x86_64 disassembly for foo looks like:

_ZN8__main__7foo$244ExN8instance8jitclass49FloatNumber$237fa45fbe4e50$3c_number$3afloat64$3eE:
        testq   %rdx, %rdx
        jle     .LBB0_1
        movsd   (%r8), %xmm0
        leaq    -1(%rdx), %rcx
        movl    %edx, %eax
        andl    $3, %eax
        cmpq    $3, %rcx
        jae     .LBB0_4
        xorpd   %xmm1, %xmm1
        xorl    %ecx, %ecx
        jmp     .LBB0_6
.LBB0_1:
        xorpd   %xmm1, %xmm1
        jmp     .LBB0_8
.LBB0_4:
        andq    $-4, %rdx
        xorpd   %xmm1, %xmm1
        xorl    %ecx, %ecx
        .p2align        4, 0x90
.LBB0_5:
        xorps   %xmm2, %xmm2
        cvtsi2sd        %rcx, %xmm2
        addsd   %xmm0, %xmm2
        addsd   %xmm1, %xmm2
        leaq    1(%rcx), %rsi
        xorps   %xmm1, %xmm1
        cvtsi2sd        %rsi, %xmm1
        addsd   %xmm0, %xmm1
        addsd   %xmm2, %xmm1
        leaq    2(%rcx), %rsi
        xorps   %xmm2, %xmm2
        cvtsi2sd        %rsi, %xmm2
        addsd   %xmm0, %xmm2
        addsd   %xmm1, %xmm2
        leaq    3(%rcx), %rsi
        xorps   %xmm1, %xmm1
        cvtsi2sd        %rsi, %xmm1
        addsd   %xmm0, %xmm1
        addsd   %xmm2, %xmm1
        addq    $4, %rcx
        cmpq    %rcx, %rdx
        jne     .LBB0_5
.LBB0_6:
        testq   %rax, %rax
        je      .LBB0_8
        .p2align        4, 0x90
.LBB0_7:
        xorps   %xmm2, %xmm2
        cvtsi2sd        %rcx, %xmm2
        incq    %rcx
        addsd   %xmm0, %xmm2
        addsd   %xmm2, %xmm1
        decq    %rax
        jne     .LBB0_7
.LBB0_8:
        movsd   %xmm1, (%rdi)
        xorl    %eax, %eax
        retq
.Lfunc_end0:

i.e. LLVM inlined everything to do with the jitclass and then unrolled the accumulator loop.

Perhaps consider using https://numba.discourse.group for discussion items like this, other users who do not follow the issue tracker may be interested. Thanks.

1reaction

stuartarchibaldcommented, Sep 30, 2021

@stuartarchibald So potentially it is possible to write a @jitclass that implements big integer, using IRBuilder like you did?

Yes but the type system would still most likely need a new type to be registered for this “big int” so as to be able to describe the field in the jitclass.

Is it possible for @jitclass to have all its methods as @intrinsics doing codegen?

I’m pretty sure the methods cannot be @intrinsics but they can call @intrinsics.

So that final variable of this class type inside njit function will behave identical to normal int, but will contain fixed-bitsize big int?

Potentially, yes.

BTW, as I remember @jitclass didn’t support special methods like __add__ yet.

I think that is correct as of version 0.54.x.

Also is it possible to implement such integer type without @jitclass? Definitely one can implement functions like x = add(y, z) for all operations using @intrinsic, but they are not as handy as having x = y + z.

Yes, this is possible but at present would likely be quite involved. It’s also going to be limited by what LLVM can handle for a given architecture. For example a udiv on the i1234 type above is not supported in LLVM on at least linux x86_64 under LLVM version 11.