After the first function with default arguments is compilled, all subsequent functions with default arguments that get compilled will be very slow to run
See original GitHub issue- I have tried using the latest released version of Numba
- I have included below a minimal working reproducer
This is a really weird bug that seems related to the closed #2029. I find that the fixes seem not to be working after the first function Numba compiles with default arguments.
The following code reproduces the issue:
import numba
@numba.njit
def func(x, y=1.0, z=1):
return x+x + y + z
func(2.2)
%timeit func(2.2)
@numba.njit
def func2(x, y=1.0, z=1):
return x+x + y + z
func2(2.2)
%timeit func2(2.2)
The output is something like as follows, and the issue appears to subsequent functions that get compiled that have default arguments also.
294 ns ± 2.02 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
27.2 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
By the way, I have found several cases where using Numba results in SIMD optimizations being performed that make code much faster than a C++ compiler does. With enough coercing in this case MSVC will indeed do the optimizations, but it was very challenging to make something work that Numba did without even letting me know!
On the other hand the MSVC experience involved compiler flags to get debug messages that slowed compilation way down and MSVC still made me do things Numba/LLVM didn’t need me to do at all. It is really feeling like a mediocre Python developer can surpass a pretty good C++ programmer these days thanks to all of your incredible work!
Full output with NUMBA_DEBUG=1 (output for both function compilations):
================================================================================
--------------------------FUNCTION OPTIMIZED DUMP nrt---------------------------
; ModuleID = 'nrt'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@.const.picklebuf.140553090603008 = internal constant { i8*, i32, i8* } { i8* getelementptr inbounds ([102 x i8], [102 x i8]* @.const.pickledata.140553090603008, i32 0, i32 0), i32 102, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.const.pickledata.140553090603008.sha1, i32 0, i32 0) }
@.const.pickledata.140553090603008 = internal constant [102 x i8] c"\80\04\95[\00\00\00\00\00\00\00\8C\08builtins\94\8C\0CRuntimeError\94\93\94\8C6numba jitted function aborted due to unresolved symbol\94\85\94N\87\94."
@.const.pickledata.140553090603008.sha1 = internal constant [20 x i8] c"\97\BE\DC\DF\EC\8E\80\B7\09>P\CE%\EDV\F7r\0E\0C\9C"
define i64 @nrt_atomic_add(i64* %.1) {
.3:
%.4 = atomicrmw add i64* %.1, i64 1 monotonic
%.5 = add i64 %.4, 1
ret i64 %.5
}
define i64 @nrt_atomic_sub(i64* %.1) {
.3:
%.4 = atomicrmw sub i64* %.1, i64 1 monotonic
%.5 = sub i64 %.4, 1
ret i64 %.5
}
define i32 @nrt_atomic_cas(i64* %.1, i64 %.2, i64 %.3, i64* %.4) {
.6:
%.7 = cmpxchg i64* %.1, i64 %.2, i64 %.3 monotonic monotonic
%.8 = extractvalue { i64, i1 } %.7, 0
%.9 = extractvalue { i64, i1 } %.7, 1
store i64 %.8, i64* %.4
%.11 = zext i1 %.9 to i32
ret i32 %.11
}
define i8* @NRT_MemInfo_data_fast(i8* %.1) {
.3:
%.4 = bitcast i8* %.1 to { i64, i8*, i8*, i8*, i64 }*
%.5 = getelementptr { i64, i8*, i8*, i8*, i64 }, { i64, i8*, i8*, i8*, i64 }* %.4, i32 0, i32 3
%.6 = load i8*, i8** %.5
ret i8* %.6
}
; Function Attrs: noinline
define void @NRT_incref(i8* %.1) #0 {
.3:
%.4 = icmp eq i8* %.1, null
br i1 %.4, label %.3.if, label %.3.endif, !prof !0
.3.if: ; preds = %.3
ret void
.3.endif: ; preds = %.3
%.7 = bitcast i8* %.1 to i64*
%.8 = call i64 @nrt_atomic_add(i64* %.7)
ret void
}
; Function Attrs: noinline
define void @NRT_decref(i8* %.1) #0 {
.3:
%.4 = icmp eq i8* %.1, null
br i1 %.4, label %.3.if, label %.3.endif, !prof !0
.3.if: ; preds = %.3.endif.if, %.3.endif, %.3
ret void
.3.endif: ; preds = %.3
fence release
%.8 = bitcast i8* %.1 to i64*
%.9 = call i64 @nrt_atomic_sub(i64* %.8)
%.10 = icmp eq i64 %.9, 0
br i1 %.10, label %.3.endif.if, label %.3.if, !prof !0
.3.endif.if: ; preds = %.3.endif
fence acquire
call void @NRT_MemInfo_call_dtor(i8* %.1)
br label %.3.if
}
declare void @NRT_MemInfo_call_dtor(i8*)
define i32 @nrt_unresolved_abort(i8** %.1, { i8*, i32, i8* }** %.2) {
.4:
store { i8*, i32, i8* }* @.const.picklebuf.140553090603008, { i8*, i32, i8* }** %.2
ret i32 1
}
attributes #0 = { noinline }
!0 = !{!"branch_weights", i32 1, i32 99}
================================================================================
================================================================================
-------------------------------OPTIMIZED DUMP nrt-------------------------------
; ModuleID = 'nrt'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@.const.picklebuf.140553090603008 = internal constant { i8*, i32, i8* } { i8* getelementptr inbounds ([102 x i8], [102 x i8]* @.const.pickledata.140553090603008, i32 0, i32 0), i32 102, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.const.pickledata.140553090603008.sha1, i32 0, i32 0) }
@.const.pickledata.140553090603008 = internal constant [102 x i8] c"\80\04\95[\00\00\00\00\00\00\00\8C\08builtins\94\8C\0CRuntimeError\94\93\94\8C6numba jitted function aborted due to unresolved symbol\94\85\94N\87\94."
@.const.pickledata.140553090603008.sha1 = internal constant [20 x i8] c"\97\BE\DC\DF\EC\8E\80\B7\09>P\CE%\EDV\F7r\0E\0C\9C"
; Function Attrs: nofree norecurse nounwind
define i64 @nrt_atomic_add(i64* nocapture %.1) local_unnamed_addr #0 {
.3:
%.4 = atomicrmw add i64* %.1, i64 1 monotonic
%.5 = add i64 %.4, 1
ret i64 %.5
}
; Function Attrs: nofree norecurse nounwind
define i64 @nrt_atomic_sub(i64* nocapture %.1) local_unnamed_addr #0 {
.3:
%.4 = atomicrmw sub i64* %.1, i64 1 monotonic
%.5 = add i64 %.4, -1
ret i64 %.5
}
; Function Attrs: nofree norecurse nounwind
define i32 @nrt_atomic_cas(i64* nocapture %.1, i64 %.2, i64 %.3, i64* nocapture %.4) local_unnamed_addr #0 {
.6:
%.7 = cmpxchg i64* %.1, i64 %.2, i64 %.3 monotonic monotonic
%.8 = extractvalue { i64, i1 } %.7, 0
%.9 = extractvalue { i64, i1 } %.7, 1
store i64 %.8, i64* %.4, align 8
%.11 = zext i1 %.9 to i32
ret i32 %.11
}
; Function Attrs: norecurse nounwind readonly
define i8* @NRT_MemInfo_data_fast(i8* nocapture readonly %.1) local_unnamed_addr #1 {
.3:
%.5 = getelementptr i8, i8* %.1, i64 24
%0 = bitcast i8* %.5 to i8**
%.6 = load i8*, i8** %0, align 8
ret i8* %.6
}
; Function Attrs: nofree noinline norecurse nounwind
define void @NRT_incref(i8* %.1) local_unnamed_addr #2 {
.3:
%.4 = icmp eq i8* %.1, null
br i1 %.4, label %.3.if, label %.3.endif, !prof !0
.3.if: ; preds = %.3
ret void
.3.endif: ; preds = %.3
%.7 = bitcast i8* %.1 to i64*
%.4.i = atomicrmw add i64* %.7, i64 1 monotonic
ret void
}
; Function Attrs: noinline
define void @NRT_decref(i8* %.1) local_unnamed_addr #3 {
.3:
%.4 = icmp eq i8* %.1, null
br i1 %.4, label %.3.if, label %.3.endif, !prof !0
.3.if: ; preds = %.3.endif, %.3
ret void
.3.endif: ; preds = %.3
fence release
%.8 = bitcast i8* %.1 to i64*
%.4.i = atomicrmw sub i64* %.8, i64 1 monotonic
%.10 = icmp eq i64 %.4.i, 1
br i1 %.10, label %.3.endif.if, label %.3.if, !prof !0
.3.endif.if: ; preds = %.3.endif
fence acquire
tail call void @NRT_MemInfo_call_dtor(i8* nonnull %.1)
ret void
}
declare void @NRT_MemInfo_call_dtor(i8*) local_unnamed_addr
; Function Attrs: nofree norecurse nounwind writeonly
define i32 @nrt_unresolved_abort(i8** nocapture readnone %.1, { i8*, i32, i8* }** nocapture %.2) local_unnamed_addr #4 {
.4:
store { i8*, i32, i8* }* @.const.picklebuf.140553090603008, { i8*, i32, i8* }** %.2, align 8
ret i32 1
}
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #5
attributes #0 = { nofree norecurse nounwind }
attributes #1 = { norecurse nounwind readonly }
attributes #2 = { nofree noinline norecurse nounwind }
attributes #3 = { noinline }
attributes #4 = { nofree norecurse nounwind writeonly }
attributes #5 = { nounwind }
!0 = !{!"branch_weights", i32 1, i32 99}
================================================================================
================================================================================
----------------------------------ASSEMBLY nrt----------------------------------
.text
.file "<string>"
.globl nrt_atomic_add
.p2align 4, 0x90
.type nrt_atomic_add,@function
nrt_atomic_add:
movl $1, %eax
lock xaddq %rax, (%rdi)
incq %rax
retq
.Lfunc_end0:
.size nrt_atomic_add, .Lfunc_end0-nrt_atomic_add
.globl nrt_atomic_sub
.p2align 4, 0x90
.type nrt_atomic_sub,@function
nrt_atomic_sub:
movq $-1, %rax
lock xaddq %rax, (%rdi)
decq %rax
retq
.Lfunc_end1:
.size nrt_atomic_sub, .Lfunc_end1-nrt_atomic_sub
.globl nrt_atomic_cas
.p2align 4, 0x90
.type nrt_atomic_cas,@function
nrt_atomic_cas:
movq %rsi, %rax
xorl %esi, %esi
lock cmpxchgq %rdx, (%rdi)
sete %sil
movq %rax, (%rcx)
movl %esi, %eax
retq
.Lfunc_end2:
.size nrt_atomic_cas, .Lfunc_end2-nrt_atomic_cas
.globl NRT_MemInfo_data_fast
.p2align 4, 0x90
.type NRT_MemInfo_data_fast,@function
NRT_MemInfo_data_fast:
movq 24(%rdi), %rax
retq
.Lfunc_end3:
.size NRT_MemInfo_data_fast, .Lfunc_end3-NRT_MemInfo_data_fast
.globl NRT_incref
.p2align 4, 0x90
.type NRT_incref,@function
NRT_incref:
testq %rdi, %rdi
je .LBB4_1
lock incq (%rdi)
retq
.LBB4_1:
retq
.Lfunc_end4:
.size NRT_incref, .Lfunc_end4-NRT_incref
.globl NRT_decref
.p2align 4, 0x90
.type NRT_decref,@function
NRT_decref:
.cfi_startproc
testq %rdi, %rdi
je .LBB5_2
#MEMBARRIER
lock decq (%rdi)
je .LBB5_3
.LBB5_2:
retq
.LBB5_3:
#MEMBARRIER
movabsq $NRT_MemInfo_call_dtor, %rax
jmpq *%rax
.Lfunc_end5:
.size NRT_decref, .Lfunc_end5-NRT_decref
.cfi_endproc
.globl nrt_unresolved_abort
.p2align 4, 0x90
.type nrt_unresolved_abort,@function
nrt_unresolved_abort:
movabsq $.const.picklebuf.140553090603008, %rax
movq %rax, (%rsi)
movl $1, %eax
retq
.Lfunc_end6:
.size nrt_unresolved_abort, .Lfunc_end6-nrt_unresolved_abort
.type .const.picklebuf.140553090603008,@object
.section .rodata,"a",@progbits
.p2align 4
.const.picklebuf.140553090603008:
.quad .const.pickledata.140553090603008
.long 102
.zero 4
.quad .const.pickledata.140553090603008.sha1
.size .const.picklebuf.140553090603008, 24
.type .const.pickledata.140553090603008,@object
.p2align 4
.const.pickledata.140553090603008:
.ascii "\200\004\225[\000\000\000\000\000\000\000\214\bbuiltins\224\214\fRuntimeError\224\223\224\2146numba jitted function aborted due to unresolved symbol\224\205\224N\207\224."
.size .const.pickledata.140553090603008, 102
.type .const.pickledata.140553090603008.sha1,@object
.p2align 4
.const.pickledata.140553090603008.sha1:
.ascii "\227\276\334\337\354\216\200\267\t>P\316%\355V\367r\016\f\234"
.size .const.pickledata.140553090603008.sha1, 20
.section ".note.GNU-stack","",@progbits
================================================================================
---------------------------------IR DUMP: func----------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
z = arg(2, name=z) ['z']
$6binary_add.2 = x + x ['$6binary_add.2', 'x', 'x']
$10binary_add.4 = $6binary_add.2 + y ['$10binary_add.4', '$6binary_add.2', 'y']
$14binary_add.6 = $10binary_add.4 + z ['$10binary_add.4', '$14binary_add.6', 'z']
$16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
return $16return_value.7 ['$16return_value.7']
---------------------------------IR DUMP: func----------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
z = arg(2, name=z) ['z']
$6binary_add.2 = x + x ['$6binary_add.2', 'x', 'x']
$10binary_add.4 = $6binary_add.2 + y ['$10binary_add.4', '$6binary_add.2', 'y']
$14binary_add.6 = $10binary_add.4 + z ['$10binary_add.4', '$14binary_add.6', 'z']
$16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
return $16return_value.7 ['$16return_value.7']
-------------------------------SSA IR DUMP: func--------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
z = arg(2, name=z) ['z']
$6binary_add.2 = x + x ['$6binary_add.2', 'x', 'x']
$10binary_add.4 = $6binary_add.2 + y ['$10binary_add.4', '$6binary_add.2', 'y']
$14binary_add.6 = $10binary_add.4 + z ['$10binary_add.4', '$14binary_add.6', 'z']
$16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
return $16return_value.7 ['$16return_value.7']
-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
$14binary_add.6 := float64,
$16return_value.7 := float64,
$6binary_add.2 := float64,
arg.x := float64,
arg.y := omitted(default=1.0),
arg.z := omitted(default=1),
x := float64,
y := float64,
z := Literal[int](1)]
-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
$14binary_add.6 := float64,
$16return_value.7 := float64,
$6binary_add.2 := float64,
arg.x := float64,
arg.y := omitted(default=1.0),
arg.z := omitted(default=1),
x := float64,
y := float64,
z := Literal[int](1)]
---------------------------------Variable types---------------------------------
{'$10binary_add.4': float64,
'$14binary_add.6': float64,
'$16return_value.7': float64,
'$6binary_add.2': float64,
'arg.x': float64,
'arg.y': omitted(default=1.0),
'arg.z': omitted(default=1),
'x': float64,
'y': float64,
'z': Literal[int](1)}
----------------------------------Return type-----------------------------------
float64
-----------------------------------Call types-----------------------------------
{$10binary_add.4 + z: (float64, float64) -> float64,
$6binary_add.2 + y: (float64, float64) -> float64,
x + x: (float64, float64) -> float64}
--------------------LLVM DUMP <function descriptor 'func$1'>--------------------
; ModuleID = "func$1"
target triple = "x86_64-unknown-linux-gnu"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
@"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
define i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %"retptr", {i8*, i32, i8*}** noalias nocapture %"excinfo", double %"arg.x")
{
entry:
%"x" = alloca double
store double 0.0, double* %"x"
%"y" = alloca double
store double 0.0, double* %"y"
%"z" = alloca i64
store i64 0, i64* %"z"
%"$6binary_add.2" = alloca double
store double 0.0, double* %"$6binary_add.2"
%"$10binary_add.4" = alloca double
store double 0.0, double* %"$10binary_add.4"
%"$14binary_add.6" = alloca double
store double 0.0, double* %"$14binary_add.6"
%"$16return_value.7" = alloca double
store double 0.0, double* %"$16return_value.7"
br label %"B0"
B0:
%".6" = load double, double* %"x"
store double %"arg.x", double* %"x"
%".9" = load double, double* %"y"
store double 0x3ff0000000000000, double* %"y"
%".12" = load i64, i64* %"z"
store i64 1, i64* %"z"
%".14" = load double, double* %"x"
%".15" = load double, double* %"x"
%".16" = fadd double %".14", %".15"
%".18" = load double, double* %"$6binary_add.2"
store double %".16", double* %"$6binary_add.2"
%".20" = load double, double* %"x"
store double 0.0, double* %"x"
%".22" = load double, double* %"$6binary_add.2"
%".23" = load double, double* %"y"
%".24" = fadd double %".22", %".23"
%".26" = load double, double* %"$10binary_add.4"
store double %".24", double* %"$10binary_add.4"
%".28" = load double, double* %"y"
store double 0.0, double* %"y"
%".30" = load double, double* %"$6binary_add.2"
store double 0.0, double* %"$6binary_add.2"
%".32" = load double, double* %"$10binary_add.4"
%".33" = load i64, i64* %"z"
%".34" = sitofp i64 1 to double
%".35" = fadd double %".32", %".34"
%".37" = load double, double* %"$14binary_add.6"
store double %".35", double* %"$14binary_add.6"
%".39" = load i64, i64* %"z"
store i64 0, i64* %"z"
%".41" = load double, double* %"$10binary_add.4"
store double 0.0, double* %"$10binary_add.4"
%".43" = load double, double* %"$14binary_add.6"
%".45" = load double, double* %"$16return_value.7"
store double %".43", double* %"$16return_value.7"
%".47" = load double, double* %"$14binary_add.6"
store double 0.0, double* %"$14binary_add.6"
%".49" = load double, double* %"$16return_value.7"
store double %".49", double* %"retptr"
ret i32 0
}
================================================================================
================================================================================
--------------------------FUNCTION OPTIMIZED DUMP func--------------------------
; ModuleID = 'func'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
@.const.func = internal constant [5 x i8] c"func\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [112 x i8] c"missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"
@_Py_NoneStruct = external global i8
@PyExc_StopIteration = external global i8
@PyExc_SystemError = external global i8
@".const.unknown error when calling native function" = internal constant [43 x i8] c"unknown error when calling native function\00"
define i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture %excinfo, double %arg.x) {
entry:
%.16 = fadd double %arg.x, %arg.x
%.24 = fadd double %.16, 1.000000e+00
%.35 = fadd double %.24, 1.000000e+00
store double %.35, double* %retptr
ret i32 0
}
define i8* @"_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* %py_closure, i8* %py_args, i8* %py_kws) {
entry:
%.5 = alloca i8*
%.6 = alloca i8*
%.7 = alloca i8*
%.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.const.func, i32 0, i32 0), i64 3, i64 3, i8** %.5, i8** %.6, i8** %.7)
%.9 = icmp eq i32 %.8, 0
%.31 = alloca double
store double 0.000000e+00, double* %.31
%excinfo = alloca { i8*, i32, i8* }*
store { i8*, i32, i8* }* null, { i8*, i32, i8* }** %excinfo
br i1 %.9, label %entry.if, label %entry.endif, !prof !0
entry.if: ; preds = %entry.endif.endif.endif.endif.endif, %entry.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.if.if, %entry.endif.endif, %entry.endif.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.endif.endif.endif, %entry
ret i8* null
entry.endif: ; preds = %entry
%.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
%.14 = ptrtoint i8* %.13 to i64
%.15 = add i64 %.14, 16
%.16 = inttoptr i64 %.15 to i8*
%.18 = icmp eq i8* null, %.13
br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0
entry.endif.if: ; preds = %entry.endif
call void @PyErr_SetString(i8* @PyExc_RuntimeError, i8* getelementptr inbounds ([112 x i8], [112 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i32 0, i32 0))
ret i8* null
entry.endif.endif: ; preds = %entry.endif
%.22 = load i8*, i8** %.5
%.23 = call i8* @PyNumber_Float(i8* %.22)
%.24 = call double @PyFloat_AsDouble(i8* %.23)
call void @Py_DecRef(i8* %.23)
%.26 = call i8* @PyErr_Occurred()
%.27 = icmp ne i8* null, %.26
br i1 %.27, label %entry.if, label %entry.endif.endif.endif, !prof !0
entry.endif.endif.endif: ; preds = %entry.endif.endif
store double 0.000000e+00, double* %.31
%.35 = call i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* %.31, { i8*, i32, i8* }** %excinfo, double %.24)
%.36 = load { i8*, i32, i8* }*, { i8*, i32, i8* }** %excinfo
%.37 = icmp eq i32 %.35, 0
%.38 = icmp eq i32 %.35, -2
%.41 = or i1 %.37, %.38
%.43 = icmp sge i32 %.35, 1
%.45 = load double, double* %.31
switch i32 %.35, label %entry.endif.endif.endif.endif [
i32 -2, label %entry.endif.endif.endif.if
i32 0, label %entry.endif.endif.endif.if
]
entry.endif.endif.endif.if: ; preds = %entry.endif.endif.endif, %entry.endif.endif.endif
br i1 %.38, label %entry.endif.endif.endif.if.if, label %entry.endif.endif.endif.if.endif
entry.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif
br i1 %.43, label %entry.endif.endif.endif.endif.if, label %entry.endif.endif.endif.endif.endif
entry.endif.endif.endif.if.if: ; preds = %entry.endif.endif.endif.if
call void @Py_IncRef(i8* @_Py_NoneStruct)
ret i8* @_Py_NoneStruct
entry.endif.endif.endif.if.endif: ; preds = %entry.endif.endif.endif.if
%.50 = call i8* @PyFloat_FromDouble(double %.45)
ret i8* %.50
entry.endif.endif.endif.endif.if: ; preds = %entry.endif.endif.endif.endif
call void @PyErr_Clear()
%.55 = load { i8*, i32, i8* }, { i8*, i32, i8* }* %.36
%.56 = extractvalue { i8*, i32, i8* } %.55, 0
%.58 = extractvalue { i8*, i32, i8* } %.55, 1
%.60 = extractvalue { i8*, i32, i8* } %.55, 2
%.61 = call i8* @numba_unpickle(i8* %.56, i32 %.58, i8* %.60)
%.62 = icmp ne i8* null, %.61
br i1 %.62, label %entry.endif.endif.endif.endif.if.if, label %entry.if, !prof !1
entry.endif.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif.endif
switch i32 %.35, label %entry.endif.endif.endif.endif.endif.endif.endif [
i32 -3, label %entry.endif.endif.endif.endif.endif.if
i32 -1, label %entry.if
]
entry.endif.endif.endif.endif.if.if: ; preds = %entry.endif.endif.endif.endif.if
call void @numba_do_raise(i8* %.61)
br label %entry.if
entry.endif.endif.endif.endif.endif.if: ; preds = %entry.endif.endif.endif.endif.endif
call void @PyErr_SetNone(i8* @PyExc_StopIteration)
br label %entry.if
entry.endif.endif.endif.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif.endif.endif
call void @PyErr_SetString(i8* @PyExc_SystemError, i8* getelementptr inbounds ([43 x i8], [43 x i8]* @".const.unknown error when calling native function", i32 0, i32 0))
br label %entry.if
}
declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...)
declare void @PyErr_SetString(i8*, i8*)
declare i8* @PyNumber_Float(i8*)
declare double @PyFloat_AsDouble(i8*)
declare void @Py_DecRef(i8*)
declare i8* @PyErr_Occurred()
declare void @Py_IncRef(i8*)
declare i8* @PyFloat_FromDouble(double)
declare void @PyErr_Clear()
declare i8* @numba_unpickle(i8*, i32, i8*)
declare void @numba_do_raise(i8*)
declare void @PyErr_SetNone(i8*)
!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}
================================================================================
================================================================================
------------------------------OPTIMIZED DUMP func-------------------------------
; ModuleID = 'func'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common local_unnamed_addr global i8* null
@.const.func = internal constant [5 x i8] c"func\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [112 x i8] c"missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"
; Function Attrs: nofree norecurse nounwind writeonly
define i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, double %arg.x) local_unnamed_addr #0 {
entry:
%.16 = fadd double %arg.x, %arg.x
%.24 = fadd double %.16, 1.000000e+00
%.35 = fadd double %.24, 1.000000e+00
store double %.35, double* %retptr, align 8
ret i32 0
}
define i8* @"_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* nocapture readnone %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
%.5 = alloca i8*, align 8
%.6 = alloca i8*, align 8
%.7 = alloca i8*, align 8
%.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.const.func, i64 0, i64 0), i64 3, i64 3, i8** nonnull %.5, i8** nonnull %.6, i8** nonnull %.7)
%.9 = icmp eq i32 %.8, 0
br i1 %.9, label %entry.if, label %entry.endif, !prof !0
entry.if: ; preds = %entry.endif.endif, %entry
ret i8* null
entry.endif: ; preds = %entry
%.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", align 8
%.18 = icmp eq i8* %.13, null
br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0
entry.endif.if: ; preds = %entry.endif
call void @PyErr_SetString(i8* nonnull @PyExc_RuntimeError, i8* getelementptr inbounds ([112 x i8], [112 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i64 0, i64 0))
ret i8* null
entry.endif.endif: ; preds = %entry.endif
%.22 = load i8*, i8** %.5, align 8
%.23 = call i8* @PyNumber_Float(i8* %.22)
%.24 = call double @PyFloat_AsDouble(i8* %.23)
call void @Py_DecRef(i8* %.23)
%.26 = call i8* @PyErr_Occurred()
%.27 = icmp eq i8* %.26, null
br i1 %.27, label %entry.endif.endif.endif, label %entry.if, !prof !1
entry.endif.endif.endif: ; preds = %entry.endif.endif
%.16.i = fadd double %.24, %.24
%.24.i = fadd double %.16.i, 1.000000e+00
%.35.i = fadd double %.24.i, 1.000000e+00
%.50 = call i8* @PyFloat_FromDouble(double %.35.i)
ret i8* %.50
}
declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...) local_unnamed_addr
declare void @PyErr_SetString(i8*, i8*) local_unnamed_addr
declare i8* @PyNumber_Float(i8*) local_unnamed_addr
declare double @PyFloat_AsDouble(i8*) local_unnamed_addr
declare void @Py_DecRef(i8*) local_unnamed_addr
declare i8* @PyErr_Occurred() local_unnamed_addr
declare i8* @PyFloat_FromDouble(double) local_unnamed_addr
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1
attributes #0 = { nofree norecurse nounwind writeonly }
attributes #1 = { nounwind }
!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}
================================================================================
================================================================================
---------------------------------ASSEMBLY func----------------------------------
.text
.file "<string>"
.section .rodata.cst8,"aM",@progbits,8
.p2align 3
.LCPI0_0:
.quad 4607182418800017408
.text
.globl _ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.p2align 4, 0x90
.type _ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
vaddsd %xmm0, %xmm0, %xmm0
movabsq $.LCPI0_0, %rax
vmovsd (%rax), %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vaddsd %xmm1, %xmm0, %xmm0
vmovsd %xmm0, (%rdi)
xorl %eax, %eax
retq
.Lfunc_end0:
.size _ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end0-_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.section .rodata.cst8,"aM",@progbits,8
.p2align 3
.LCPI1_0:
.quad 4607182418800017408
.text
.globl _ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.p2align 4, 0x90
.type _ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
subq $48, %rsp
.cfi_def_cfa_offset 64
.cfi_offset %rbx, -16
movq %rsi, %rdi
leaq 32(%rsp), %rax
movq %rax, (%rsp)
movabsq $.const.func, %rsi
movabsq $PyArg_UnpackTuple, %rbx
leaq 24(%rsp), %r8
leaq 40(%rsp), %r9
movl $3, %edx
movl $3, %ecx
xorl %eax, %eax
callq *%rbx
testl %eax, %eax
je .LBB1_1
movabsq $_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, %rax
cmpq $0, (%rax)
je .LBB1_4
movq 24(%rsp), %rdi
movabsq $PyNumber_Float, %rax
callq *%rax
movq %rax, %rbx
movabsq $PyFloat_AsDouble, %rax
movq %rbx, %rdi
callq *%rax
vmovsd %xmm0, 16(%rsp)
movabsq $Py_DecRef, %rax
movq %rbx, %rdi
callq *%rax
movabsq $PyErr_Occurred, %rax
callq *%rax
testq %rax, %rax
jne .LBB1_1
vmovsd 16(%rsp), %xmm0
vaddsd %xmm0, %xmm0, %xmm0
movabsq $.LCPI1_0, %rax
vmovsd (%rax), %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vaddsd %xmm1, %xmm0, %xmm0
movabsq $PyFloat_FromDouble, %rax
callq *%rax
addq $48, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
retq
.LBB1_4:
.cfi_def_cfa_offset 64
movabsq $PyExc_RuntimeError, %rdi
movabsq $".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", %rsi
movabsq $PyErr_SetString, %rax
callq *%rax
.LBB1_1:
xorl %eax, %eax
addq $48, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
retq
.Lfunc_end1:
.size _ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end1-_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.cfi_endproc
.type _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@object
.comm _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,8,8
.type .const.func,@object
.section .rodata,"a",@progbits
.const.func:
.asciz "func"
.size .const.func, 5
.type ".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29",@object
.p2align 4
".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29":
.asciz "missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
.size ".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", 112
.section ".note.GNU-stack","",@progbits
================================================================================
---------------------------------IR DUMP: func2---------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
z = arg(2, name=z) ['z']
$6binary_add.2 = x + x ['$6binary_add.2', 'x', 'x']
$10binary_add.4 = $6binary_add.2 + y ['$10binary_add.4', '$6binary_add.2', 'y']
$14binary_add.6 = $10binary_add.4 + z ['$10binary_add.4', '$14binary_add.6', 'z']
$16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
return $16return_value.7 ['$16return_value.7']
---------------------------------IR DUMP: func2---------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
z = arg(2, name=z) ['z']
$6binary_add.2 = x + x ['$6binary_add.2', 'x', 'x']
$10binary_add.4 = $6binary_add.2 + y ['$10binary_add.4', '$6binary_add.2', 'y']
$14binary_add.6 = $10binary_add.4 + z ['$10binary_add.4', '$14binary_add.6', 'z']
$16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
return $16return_value.7 ['$16return_value.7']
-------------------------------SSA IR DUMP: func2-------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
z = arg(2, name=z) ['z']
$6binary_add.2 = x + x ['$6binary_add.2', 'x', 'x']
$10binary_add.4 = $6binary_add.2 + y ['$10binary_add.4', '$6binary_add.2', 'y']
$14binary_add.6 = $10binary_add.4 + z ['$10binary_add.4', '$14binary_add.6', 'z']
$16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
return $16return_value.7 ['$16return_value.7']
-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
$14binary_add.6 := float64,
$16return_value.7 := float64,
$6binary_add.2 := float64,
arg.x := float64,
arg.y := omitted(default=1.0),
arg.z := omitted(default=1),
x := float64,
y := float64,
z := Literal[int](1)]
-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
$14binary_add.6 := float64,
$16return_value.7 := float64,
$6binary_add.2 := float64,
arg.x := float64,
arg.y := omitted(default=1.0),
arg.z := omitted(default=1),
x := float64,
y := float64,
z := Literal[int](1)]
---------------------------------Variable types---------------------------------
{'$10binary_add.4': float64,
'$14binary_add.6': float64,
'$16return_value.7': float64,
'$6binary_add.2': float64,
'arg.x': float64,
'arg.y': omitted(default=1.0),
'arg.z': omitted(default=1),
'x': float64,
'y': float64,
'z': Literal[int](1)}
----------------------------------Return type-----------------------------------
float64
-----------------------------------Call types-----------------------------------
{$10binary_add.4 + z: (float64, float64) -> float64,
$6binary_add.2 + y: (float64, float64) -> float64,
x + x: (float64, float64) -> float64}
-------------------LLVM DUMP <function descriptor 'func2$2'>--------------------
; ModuleID = "func2$2"
target triple = "x86_64-unknown-linux-gnu"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
@"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
define i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %"retptr", {i8*, i32, i8*}** noalias nocapture %"excinfo", double %"arg.x")
{
entry:
%"x" = alloca double
store double 0.0, double* %"x"
%"y" = alloca double
store double 0.0, double* %"y"
%"z" = alloca i64
store i64 0, i64* %"z"
%"$6binary_add.2" = alloca double
store double 0.0, double* %"$6binary_add.2"
%"$10binary_add.4" = alloca double
store double 0.0, double* %"$10binary_add.4"
%"$14binary_add.6" = alloca double
store double 0.0, double* %"$14binary_add.6"
%"$16return_value.7" = alloca double
store double 0.0, double* %"$16return_value.7"
br label %"B0"
B0:
%".6" = load double, double* %"x"
store double %"arg.x", double* %"x"
%".9" = load double, double* %"y"
store double 0x3ff0000000000000, double* %"y"
%".12" = load i64, i64* %"z"
store i64 1, i64* %"z"
%".14" = load double, double* %"x"
%".15" = load double, double* %"x"
%".16" = fadd double %".14", %".15"
%".18" = load double, double* %"$6binary_add.2"
store double %".16", double* %"$6binary_add.2"
%".20" = load double, double* %"x"
store double 0.0, double* %"x"
%".22" = load double, double* %"$6binary_add.2"
%".23" = load double, double* %"y"
%".24" = fadd double %".22", %".23"
%".26" = load double, double* %"$10binary_add.4"
store double %".24", double* %"$10binary_add.4"
%".28" = load double, double* %"y"
store double 0.0, double* %"y"
%".30" = load double, double* %"$6binary_add.2"
store double 0.0, double* %"$6binary_add.2"
%".32" = load double, double* %"$10binary_add.4"
%".33" = load i64, i64* %"z"
%".34" = sitofp i64 1 to double
%".35" = fadd double %".32", %".34"
%".37" = load double, double* %"$14binary_add.6"
store double %".35", double* %"$14binary_add.6"
%".39" = load i64, i64* %"z"
store i64 0, i64* %"z"
%".41" = load double, double* %"$10binary_add.4"
store double 0.0, double* %"$10binary_add.4"
%".43" = load double, double* %"$14binary_add.6"
%".45" = load double, double* %"$16return_value.7"
store double %".43", double* %"$16return_value.7"
%".47" = load double, double* %"$14binary_add.6"
store double 0.0, double* %"$14binary_add.6"
%".49" = load double, double* %"$16return_value.7"
store double %".49", double* %"retptr"
ret i32 0
}
================================================================================
================================================================================
-------------------------FUNCTION OPTIMIZED DUMP func2--------------------------
; ModuleID = 'func2'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
@.const.func2 = internal constant [6 x i8] c"func2\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [113 x i8] c"missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"
@_Py_NoneStruct = external global i8
@PyExc_StopIteration = external global i8
@PyExc_SystemError = external global i8
@".const.unknown error when calling native function" = internal constant [43 x i8] c"unknown error when calling native function\00"
define i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture %excinfo, double %arg.x) {
entry:
%.16 = fadd double %arg.x, %arg.x
%.24 = fadd double %.16, 1.000000e+00
%.35 = fadd double %.24, 1.000000e+00
store double %.35, double* %retptr
ret i32 0
}
define i8* @"_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* %py_closure, i8* %py_args, i8* %py_kws) {
entry:
%.5 = alloca i8*
%.6 = alloca i8*
%.7 = alloca i8*
%.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.const.func2, i32 0, i32 0), i64 3, i64 3, i8** %.5, i8** %.6, i8** %.7)
%.9 = icmp eq i32 %.8, 0
%.31 = alloca double
store double 0.000000e+00, double* %.31
%excinfo = alloca { i8*, i32, i8* }*
store { i8*, i32, i8* }* null, { i8*, i32, i8* }** %excinfo
br i1 %.9, label %entry.if, label %entry.endif, !prof !0
entry.if: ; preds = %entry.endif.endif.endif.endif.endif, %entry.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.if.if, %entry.endif.endif, %entry.endif.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.endif.endif.endif, %entry
ret i8* null
entry.endif: ; preds = %entry
%.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
%.14 = ptrtoint i8* %.13 to i64
%.15 = add i64 %.14, 16
%.16 = inttoptr i64 %.15 to i8*
%.18 = icmp eq i8* null, %.13
br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0
entry.endif.if: ; preds = %entry.endif
call void @PyErr_SetString(i8* @PyExc_RuntimeError, i8* getelementptr inbounds ([113 x i8], [113 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i32 0, i32 0))
ret i8* null
entry.endif.endif: ; preds = %entry.endif
%.22 = load i8*, i8** %.5
%.23 = call i8* @PyNumber_Float(i8* %.22)
%.24 = call double @PyFloat_AsDouble(i8* %.23)
call void @Py_DecRef(i8* %.23)
%.26 = call i8* @PyErr_Occurred()
%.27 = icmp ne i8* null, %.26
br i1 %.27, label %entry.if, label %entry.endif.endif.endif, !prof !0
entry.endif.endif.endif: ; preds = %entry.endif.endif
store double 0.000000e+00, double* %.31
%.35 = call i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* %.31, { i8*, i32, i8* }** %excinfo, double %.24)
%.36 = load { i8*, i32, i8* }*, { i8*, i32, i8* }** %excinfo
%.37 = icmp eq i32 %.35, 0
%.38 = icmp eq i32 %.35, -2
%.41 = or i1 %.37, %.38
%.43 = icmp sge i32 %.35, 1
%.45 = load double, double* %.31
switch i32 %.35, label %entry.endif.endif.endif.endif [
i32 -2, label %entry.endif.endif.endif.if
i32 0, label %entry.endif.endif.endif.if
]
entry.endif.endif.endif.if: ; preds = %entry.endif.endif.endif, %entry.endif.endif.endif
br i1 %.38, label %entry.endif.endif.endif.if.if, label %entry.endif.endif.endif.if.endif
entry.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif
br i1 %.43, label %entry.endif.endif.endif.endif.if, label %entry.endif.endif.endif.endif.endif
entry.endif.endif.endif.if.if: ; preds = %entry.endif.endif.endif.if
call void @Py_IncRef(i8* @_Py_NoneStruct)
ret i8* @_Py_NoneStruct
entry.endif.endif.endif.if.endif: ; preds = %entry.endif.endif.endif.if
%.50 = call i8* @PyFloat_FromDouble(double %.45)
ret i8* %.50
entry.endif.endif.endif.endif.if: ; preds = %entry.endif.endif.endif.endif
call void @PyErr_Clear()
%.55 = load { i8*, i32, i8* }, { i8*, i32, i8* }* %.36
%.56 = extractvalue { i8*, i32, i8* } %.55, 0
%.58 = extractvalue { i8*, i32, i8* } %.55, 1
%.60 = extractvalue { i8*, i32, i8* } %.55, 2
%.61 = call i8* @numba_unpickle(i8* %.56, i32 %.58, i8* %.60)
%.62 = icmp ne i8* null, %.61
br i1 %.62, label %entry.endif.endif.endif.endif.if.if, label %entry.if, !prof !1
entry.endif.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif.endif
switch i32 %.35, label %entry.endif.endif.endif.endif.endif.endif.endif [
i32 -3, label %entry.endif.endif.endif.endif.endif.if
i32 -1, label %entry.if
]
entry.endif.endif.endif.endif.if.if: ; preds = %entry.endif.endif.endif.endif.if
call void @numba_do_raise(i8* %.61)
br label %entry.if
entry.endif.endif.endif.endif.endif.if: ; preds = %entry.endif.endif.endif.endif.endif
call void @PyErr_SetNone(i8* @PyExc_StopIteration)
br label %entry.if
entry.endif.endif.endif.endif.endif.endif.endif: ; preds = %entry.endif.endif.endif.endif.endif
call void @PyErr_SetString(i8* @PyExc_SystemError, i8* getelementptr inbounds ([43 x i8], [43 x i8]* @".const.unknown error when calling native function", i32 0, i32 0))
br label %entry.if
}
declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...)
declare void @PyErr_SetString(i8*, i8*)
declare i8* @PyNumber_Float(i8*)
declare double @PyFloat_AsDouble(i8*)
declare void @Py_DecRef(i8*)
declare i8* @PyErr_Occurred()
declare void @Py_IncRef(i8*)
declare i8* @PyFloat_FromDouble(double)
declare void @PyErr_Clear()
declare i8* @numba_unpickle(i8*, i32, i8*)
declare void @numba_do_raise(i8*)
declare void @PyErr_SetNone(i8*)
!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}
================================================================================
================================================================================
------------------------------OPTIMIZED DUMP func2------------------------------
; ModuleID = 'func2'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common local_unnamed_addr global i8* null
@.const.func2 = internal constant [6 x i8] c"func2\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [113 x i8] c"missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"
; Function Attrs: nofree norecurse nounwind writeonly
define i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, double %arg.x) local_unnamed_addr #0 {
entry:
%.16 = fadd double %arg.x, %arg.x
%.24 = fadd double %.16, 1.000000e+00
%.35 = fadd double %.24, 1.000000e+00
store double %.35, double* %retptr, align 8
ret i32 0
}
define i8* @"_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* nocapture readnone %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
%.5 = alloca i8*, align 8
%.6 = alloca i8*, align 8
%.7 = alloca i8*, align 8
%.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.const.func2, i64 0, i64 0), i64 3, i64 3, i8** nonnull %.5, i8** nonnull %.6, i8** nonnull %.7)
%.9 = icmp eq i32 %.8, 0
br i1 %.9, label %entry.if, label %entry.endif, !prof !0
entry.if: ; preds = %entry.endif.endif, %entry
ret i8* null
entry.endif: ; preds = %entry
%.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", align 8
%.18 = icmp eq i8* %.13, null
br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0
entry.endif.if: ; preds = %entry.endif
call void @PyErr_SetString(i8* nonnull @PyExc_RuntimeError, i8* getelementptr inbounds ([113 x i8], [113 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i64 0, i64 0))
ret i8* null
entry.endif.endif: ; preds = %entry.endif
%.22 = load i8*, i8** %.5, align 8
%.23 = call i8* @PyNumber_Float(i8* %.22)
%.24 = call double @PyFloat_AsDouble(i8* %.23)
call void @Py_DecRef(i8* %.23)
%.26 = call i8* @PyErr_Occurred()
%.27 = icmp eq i8* %.26, null
br i1 %.27, label %entry.endif.endif.endif, label %entry.if, !prof !1
entry.endif.endif.endif: ; preds = %entry.endif.endif
%.16.i = fadd double %.24, %.24
%.24.i = fadd double %.16.i, 1.000000e+00
%.35.i = fadd double %.24.i, 1.000000e+00
%.50 = call i8* @PyFloat_FromDouble(double %.35.i)
ret i8* %.50
}
declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...) local_unnamed_addr
declare void @PyErr_SetString(i8*, i8*) local_unnamed_addr
declare i8* @PyNumber_Float(i8*) local_unnamed_addr
declare double @PyFloat_AsDouble(i8*) local_unnamed_addr
declare void @Py_DecRef(i8*) local_unnamed_addr
declare i8* @PyErr_Occurred() local_unnamed_addr
declare i8* @PyFloat_FromDouble(double) local_unnamed_addr
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1
attributes #0 = { nofree norecurse nounwind writeonly }
attributes #1 = { nounwind }
!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}
================================================================================
================================================================================
---------------------------------ASSEMBLY func2---------------------------------
.text
.file "<string>"
.section .rodata.cst8,"aM",@progbits,8
.p2align 3
.LCPI0_0:
.quad 4607182418800017408
.text
.globl _ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.p2align 4, 0x90
.type _ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
vaddsd %xmm0, %xmm0, %xmm0
movabsq $.LCPI0_0, %rax
vmovsd (%rax), %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vaddsd %xmm1, %xmm0, %xmm0
vmovsd %xmm0, (%rdi)
xorl %eax, %eax
retq
.Lfunc_end0:
.size _ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end0-_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.section .rodata.cst8,"aM",@progbits,8
.p2align 3
.LCPI1_0:
.quad 4607182418800017408
.text
.globl _ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.p2align 4, 0x90
.type _ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
subq $48, %rsp
.cfi_def_cfa_offset 64
.cfi_offset %rbx, -16
movq %rsi, %rdi
leaq 32(%rsp), %rax
movq %rax, (%rsp)
movabsq $.const.func2, %rsi
movabsq $PyArg_UnpackTuple, %rbx
leaq 24(%rsp), %r8
leaq 40(%rsp), %r9
movl $3, %edx
movl $3, %ecx
xorl %eax, %eax
callq *%rbx
testl %eax, %eax
je .LBB1_1
movabsq $_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, %rax
cmpq $0, (%rax)
je .LBB1_4
movq 24(%rsp), %rdi
movabsq $PyNumber_Float, %rax
callq *%rax
movq %rax, %rbx
movabsq $PyFloat_AsDouble, %rax
movq %rbx, %rdi
callq *%rax
vmovsd %xmm0, 16(%rsp)
movabsq $Py_DecRef, %rax
movq %rbx, %rdi
callq *%rax
movabsq $PyErr_Occurred, %rax
callq *%rax
testq %rax, %rax
jne .LBB1_1
vmovsd 16(%rsp), %xmm0
vaddsd %xmm0, %xmm0, %xmm0
movabsq $.LCPI1_0, %rax
vmovsd (%rax), %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vaddsd %xmm1, %xmm0, %xmm0
movabsq $PyFloat_FromDouble, %rax
callq *%rax
addq $48, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
retq
.LBB1_4:
.cfi_def_cfa_offset 64
movabsq $PyExc_RuntimeError, %rdi
movabsq $".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", %rsi
movabsq $PyErr_SetString, %rax
callq *%rax
.LBB1_1:
xorl %eax, %eax
addq $48, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
retq
.Lfunc_end1:
.size _ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end1-_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
.cfi_endproc
.type _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@object
.comm _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,8,8
.type .const.func2,@object
.section .rodata,"a",@progbits
.const.func2:
.asciz "func2"
.size .const.func2, 6
.type ".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29",@object
.p2align 4
".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29":
.asciz "missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
.size ".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", 113
.section ".note.GNU-stack","",@progbits
================================================================================
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (7 by maintainers)
I looked again at this issue this morning, and made a little bit of progress. I had noticed #6957, and it turns out that this might be related. I tentatively changed the key method of the Omitted class. I took out “id(self._value)” from the line, and this issue went away.
I am sure this is an important line and shouldn’t be removed, but it was nice to make a little bit of progress. I can confirm this is still an issue. Sincerely, Caleb
Looking at this again, I think this is to do with the dispatcher function cache being missed as the typecode of the
Omitted
values are different across invocations (hence changing the.key
helps), though it’s not obvious what is going on.In the above, the omitted float and int are (locally) 373 and 375 for
func
and the dispatcher cache bakes these in. When the%timeit
runs onfunc
the dispatcher matches the signature against the typecodes for the omitted float and int and it gets a perfect match and so just executes the function.In the
func2
invocation, the dispatcher “sees” a signature using the 373 and 375 typecodes, but but the computed typecodes of the omitted float and int are (locally) 1682 and something else respectively (the code jumps out of the loop at no match on the first), i.e. no match, so there’s a recompile, hence performance difference. What’s also “strange” is that it looks like each invocation offunc2
gets a new set of computed typecodes for its arguments but for some reason the same doesn’t happen infunc
.I’m reasonably convinced that the performance difference comes down to the dispatcher matching the typecodes exactly in the first run of
func
and then subsequently not matching ever again and the resulting recompilation is what’s causing the difference. This however is just a side effect of something else, it does not explain why the first is “typecode stable” and the second is not.Demonstration:
Outstanding questions: