question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

After the first function with default arguments is compilled, all subsequent functions with default arguments that get compilled will be very slow to run

See original GitHub issue
  • I have tried using the latest released version of Numba
  • I have included below a minimal working reproducer

This is a really weird bug that seems related to the closed #2029. I find that the fixes seem not to be working after the first function Numba compiles with default arguments.

The following code reproduces the issue:

import numba

@numba.njit
def func(x, y=1.0, z=1):
    return x+x + y + z
func(2.2)
%timeit func(2.2)

@numba.njit
def func2(x, y=1.0, z=1):
    return x+x + y + z
func2(2.2)
%timeit func2(2.2)

The output is something like as follows, and the issue appears to subsequent functions that get compiled that have default arguments also.

294 ns ± 2.02 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
27.2 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

By the way, I have found several cases where using Numba results in SIMD optimizations being performed that make code much faster than a C++ compiler does. With enough coercing in this case MSVC will indeed do the optimizations, but it was very challenging to make something work that Numba did without even letting me know!

On the other hand the MSVC experience involved compiler flags to get debug messages that slowed compilation way down and MSVC still made me do things Numba/LLVM didn’t need me to do at all. It is really feeling like a mediocre Python developer can surpass a pretty good C++ programmer these days thanks to all of your incredible work!

Full output with NUMBA_DEBUG=1 (output for both function compilations):
================================================================================
--------------------------FUNCTION OPTIMIZED DUMP nrt---------------------------
; ModuleID = 'nrt'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.const.picklebuf.140553090603008 = internal constant { i8*, i32, i8* } { i8* getelementptr inbounds ([102 x i8], [102 x i8]* @.const.pickledata.140553090603008, i32 0, i32 0), i32 102, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.const.pickledata.140553090603008.sha1, i32 0, i32 0) }
@.const.pickledata.140553090603008 = internal constant [102 x i8] c"\80\04\95[\00\00\00\00\00\00\00\8C\08builtins\94\8C\0CRuntimeError\94\93\94\8C6numba jitted function aborted due to unresolved symbol\94\85\94N\87\94."
@.const.pickledata.140553090603008.sha1 = internal constant [20 x i8] c"\97\BE\DC\DF\EC\8E\80\B7\09>P\CE%\EDV\F7r\0E\0C\9C"

define i64 @nrt_atomic_add(i64* %.1) {
.3:
  %.4 = atomicrmw add i64* %.1, i64 1 monotonic
  %.5 = add i64 %.4, 1
  ret i64 %.5
}

define i64 @nrt_atomic_sub(i64* %.1) {
.3:
  %.4 = atomicrmw sub i64* %.1, i64 1 monotonic
  %.5 = sub i64 %.4, 1
  ret i64 %.5
}

define i32 @nrt_atomic_cas(i64* %.1, i64 %.2, i64 %.3, i64* %.4) {
.6:
  %.7 = cmpxchg i64* %.1, i64 %.2, i64 %.3 monotonic monotonic
  %.8 = extractvalue { i64, i1 } %.7, 0
  %.9 = extractvalue { i64, i1 } %.7, 1
  store i64 %.8, i64* %.4
  %.11 = zext i1 %.9 to i32
  ret i32 %.11
}

define i8* @NRT_MemInfo_data_fast(i8* %.1) {
.3:
  %.4 = bitcast i8* %.1 to { i64, i8*, i8*, i8*, i64 }*
  %.5 = getelementptr { i64, i8*, i8*, i8*, i64 }, { i64, i8*, i8*, i8*, i64 }* %.4, i32 0, i32 3
  %.6 = load i8*, i8** %.5
  ret i8* %.6
}

; Function Attrs: noinline
define void @NRT_incref(i8* %.1) #0 {
.3:
  %.4 = icmp eq i8* %.1, null
  br i1 %.4, label %.3.if, label %.3.endif, !prof !0

.3.if:                                            ; preds = %.3
  ret void

.3.endif:                                         ; preds = %.3
  %.7 = bitcast i8* %.1 to i64*
  %.8 = call i64 @nrt_atomic_add(i64* %.7)
  ret void
}

; Function Attrs: noinline
define void @NRT_decref(i8* %.1) #0 {
.3:
  %.4 = icmp eq i8* %.1, null
  br i1 %.4, label %.3.if, label %.3.endif, !prof !0

.3.if:                                            ; preds = %.3.endif.if, %.3.endif, %.3
  ret void

.3.endif:                                         ; preds = %.3
  fence release
  %.8 = bitcast i8* %.1 to i64*
  %.9 = call i64 @nrt_atomic_sub(i64* %.8)
  %.10 = icmp eq i64 %.9, 0
  br i1 %.10, label %.3.endif.if, label %.3.if, !prof !0

.3.endif.if:                                      ; preds = %.3.endif
  fence acquire
  call void @NRT_MemInfo_call_dtor(i8* %.1)
  br label %.3.if
}

declare void @NRT_MemInfo_call_dtor(i8*)

define i32 @nrt_unresolved_abort(i8** %.1, { i8*, i32, i8* }** %.2) {
.4:
  store { i8*, i32, i8* }* @.const.picklebuf.140553090603008, { i8*, i32, i8* }** %.2
  ret i32 1
}

attributes #0 = { noinline }

!0 = !{!"branch_weights", i32 1, i32 99}

================================================================================
================================================================================
-------------------------------OPTIMIZED DUMP nrt-------------------------------
; ModuleID = 'nrt'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.const.picklebuf.140553090603008 = internal constant { i8*, i32, i8* } { i8* getelementptr inbounds ([102 x i8], [102 x i8]* @.const.pickledata.140553090603008, i32 0, i32 0), i32 102, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.const.pickledata.140553090603008.sha1, i32 0, i32 0) }
@.const.pickledata.140553090603008 = internal constant [102 x i8] c"\80\04\95[\00\00\00\00\00\00\00\8C\08builtins\94\8C\0CRuntimeError\94\93\94\8C6numba jitted function aborted due to unresolved symbol\94\85\94N\87\94."
@.const.pickledata.140553090603008.sha1 = internal constant [20 x i8] c"\97\BE\DC\DF\EC\8E\80\B7\09>P\CE%\EDV\F7r\0E\0C\9C"

; Function Attrs: nofree norecurse nounwind
define i64 @nrt_atomic_add(i64* nocapture %.1) local_unnamed_addr #0 {
.3:
  %.4 = atomicrmw add i64* %.1, i64 1 monotonic
  %.5 = add i64 %.4, 1
  ret i64 %.5
}

; Function Attrs: nofree norecurse nounwind
define i64 @nrt_atomic_sub(i64* nocapture %.1) local_unnamed_addr #0 {
.3:
  %.4 = atomicrmw sub i64* %.1, i64 1 monotonic
  %.5 = add i64 %.4, -1
  ret i64 %.5
}

; Function Attrs: nofree norecurse nounwind
define i32 @nrt_atomic_cas(i64* nocapture %.1, i64 %.2, i64 %.3, i64* nocapture %.4) local_unnamed_addr #0 {
.6:
  %.7 = cmpxchg i64* %.1, i64 %.2, i64 %.3 monotonic monotonic
  %.8 = extractvalue { i64, i1 } %.7, 0
  %.9 = extractvalue { i64, i1 } %.7, 1
  store i64 %.8, i64* %.4, align 8
  %.11 = zext i1 %.9 to i32
  ret i32 %.11
}

; Function Attrs: norecurse nounwind readonly
define i8* @NRT_MemInfo_data_fast(i8* nocapture readonly %.1) local_unnamed_addr #1 {
.3:
  %.5 = getelementptr i8, i8* %.1, i64 24
  %0 = bitcast i8* %.5 to i8**
  %.6 = load i8*, i8** %0, align 8
  ret i8* %.6
}

; Function Attrs: nofree noinline norecurse nounwind
define void @NRT_incref(i8* %.1) local_unnamed_addr #2 {
.3:
  %.4 = icmp eq i8* %.1, null
  br i1 %.4, label %.3.if, label %.3.endif, !prof !0

.3.if:                                            ; preds = %.3
  ret void

.3.endif:                                         ; preds = %.3
  %.7 = bitcast i8* %.1 to i64*
  %.4.i = atomicrmw add i64* %.7, i64 1 monotonic
  ret void
}

; Function Attrs: noinline
define void @NRT_decref(i8* %.1) local_unnamed_addr #3 {
.3:
  %.4 = icmp eq i8* %.1, null
  br i1 %.4, label %.3.if, label %.3.endif, !prof !0

.3.if:                                            ; preds = %.3.endif, %.3
  ret void

.3.endif:                                         ; preds = %.3
  fence release
  %.8 = bitcast i8* %.1 to i64*
  %.4.i = atomicrmw sub i64* %.8, i64 1 monotonic
  %.10 = icmp eq i64 %.4.i, 1
  br i1 %.10, label %.3.endif.if, label %.3.if, !prof !0

.3.endif.if:                                      ; preds = %.3.endif
  fence acquire
  tail call void @NRT_MemInfo_call_dtor(i8* nonnull %.1)
  ret void
}

declare void @NRT_MemInfo_call_dtor(i8*) local_unnamed_addr

; Function Attrs: nofree norecurse nounwind writeonly
define i32 @nrt_unresolved_abort(i8** nocapture readnone %.1, { i8*, i32, i8* }** nocapture %.2) local_unnamed_addr #4 {
.4:
  store { i8*, i32, i8* }* @.const.picklebuf.140553090603008, { i8*, i32, i8* }** %.2, align 8
  ret i32 1
}

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #5

attributes #0 = { nofree norecurse nounwind }
attributes #1 = { norecurse nounwind readonly }
attributes #2 = { nofree noinline norecurse nounwind }
attributes #3 = { noinline }
attributes #4 = { nofree norecurse nounwind writeonly }
attributes #5 = { nounwind }

!0 = !{!"branch_weights", i32 1, i32 99}

================================================================================
================================================================================
----------------------------------ASSEMBLY nrt----------------------------------
	.text
	.file	"<string>"
	.globl	nrt_atomic_add
	.p2align	4, 0x90
	.type	nrt_atomic_add,@function
nrt_atomic_add:
	movl	$1, %eax
	lock		xaddq	%rax, (%rdi)
	incq	%rax
	retq
.Lfunc_end0:
	.size	nrt_atomic_add, .Lfunc_end0-nrt_atomic_add

	.globl	nrt_atomic_sub
	.p2align	4, 0x90
	.type	nrt_atomic_sub,@function
nrt_atomic_sub:
	movq	$-1, %rax
	lock		xaddq	%rax, (%rdi)
	decq	%rax
	retq
.Lfunc_end1:
	.size	nrt_atomic_sub, .Lfunc_end1-nrt_atomic_sub

	.globl	nrt_atomic_cas
	.p2align	4, 0x90
	.type	nrt_atomic_cas,@function
nrt_atomic_cas:
	movq	%rsi, %rax
	xorl	%esi, %esi
	lock		cmpxchgq	%rdx, (%rdi)
	sete	%sil
	movq	%rax, (%rcx)
	movl	%esi, %eax
	retq
.Lfunc_end2:
	.size	nrt_atomic_cas, .Lfunc_end2-nrt_atomic_cas

	.globl	NRT_MemInfo_data_fast
	.p2align	4, 0x90
	.type	NRT_MemInfo_data_fast,@function
NRT_MemInfo_data_fast:
	movq	24(%rdi), %rax
	retq
.Lfunc_end3:
	.size	NRT_MemInfo_data_fast, .Lfunc_end3-NRT_MemInfo_data_fast

	.globl	NRT_incref
	.p2align	4, 0x90
	.type	NRT_incref,@function
NRT_incref:
	testq	%rdi, %rdi
	je	.LBB4_1
	lock		incq	(%rdi)
	retq
.LBB4_1:
	retq
.Lfunc_end4:
	.size	NRT_incref, .Lfunc_end4-NRT_incref

	.globl	NRT_decref
	.p2align	4, 0x90
	.type	NRT_decref,@function
NRT_decref:
	.cfi_startproc
	testq	%rdi, %rdi
	je	.LBB5_2
	#MEMBARRIER
	lock		decq	(%rdi)
	je	.LBB5_3
.LBB5_2:
	retq
.LBB5_3:
	#MEMBARRIER
	movabsq	$NRT_MemInfo_call_dtor, %rax
	jmpq	*%rax
.Lfunc_end5:
	.size	NRT_decref, .Lfunc_end5-NRT_decref
	.cfi_endproc

	.globl	nrt_unresolved_abort
	.p2align	4, 0x90
	.type	nrt_unresolved_abort,@function
nrt_unresolved_abort:
	movabsq	$.const.picklebuf.140553090603008, %rax
	movq	%rax, (%rsi)
	movl	$1, %eax
	retq
.Lfunc_end6:
	.size	nrt_unresolved_abort, .Lfunc_end6-nrt_unresolved_abort

	.type	.const.picklebuf.140553090603008,@object
	.section	.rodata,"a",@progbits
	.p2align	4
.const.picklebuf.140553090603008:
	.quad	.const.pickledata.140553090603008
	.long	102
	.zero	4
	.quad	.const.pickledata.140553090603008.sha1
	.size	.const.picklebuf.140553090603008, 24

	.type	.const.pickledata.140553090603008,@object
	.p2align	4
.const.pickledata.140553090603008:
	.ascii	"\200\004\225[\000\000\000\000\000\000\000\214\bbuiltins\224\214\fRuntimeError\224\223\224\2146numba jitted function aborted due to unresolved symbol\224\205\224N\207\224."
	.size	.const.pickledata.140553090603008, 102

	.type	.const.pickledata.140553090603008.sha1,@object
	.p2align	4
.const.pickledata.140553090603008.sha1:
	.ascii	"\227\276\334\337\354\216\200\267\t>P\316%\355V\367r\016\f\234"
	.size	.const.pickledata.140553090603008.sha1, 20

	.section	".note.GNU-stack","",@progbits

================================================================================
---------------------------------IR DUMP: func----------------------------------
label 0:
    x = arg(0, name=x)                       ['x']
    y = arg(1, name=y)                       ['y']
    z = arg(2, name=z)                       ['z']
    $6binary_add.2 = x + x                   ['$6binary_add.2', 'x', 'x']
    $10binary_add.4 = $6binary_add.2 + y     ['$10binary_add.4', '$6binary_add.2', 'y']
    $14binary_add.6 = $10binary_add.4 + z    ['$10binary_add.4', '$14binary_add.6', 'z']
    $16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
    return $16return_value.7                 ['$16return_value.7']

---------------------------------IR DUMP: func----------------------------------
label 0:
    x = arg(0, name=x)                       ['x']
    y = arg(1, name=y)                       ['y']
    z = arg(2, name=z)                       ['z']
    $6binary_add.2 = x + x                   ['$6binary_add.2', 'x', 'x']
    $10binary_add.4 = $6binary_add.2 + y     ['$10binary_add.4', '$6binary_add.2', 'y']
    $14binary_add.6 = $10binary_add.4 + z    ['$10binary_add.4', '$14binary_add.6', 'z']
    $16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
    return $16return_value.7                 ['$16return_value.7']

-------------------------------SSA IR DUMP: func--------------------------------
label 0:
    x = arg(0, name=x)                       ['x']
    y = arg(1, name=y)                       ['y']
    z = arg(2, name=z)                       ['z']
    $6binary_add.2 = x + x                   ['$6binary_add.2', 'x', 'x']
    $10binary_add.4 = $6binary_add.2 + y     ['$10binary_add.4', '$6binary_add.2', 'y']
    $14binary_add.6 = $10binary_add.4 + z    ['$10binary_add.4', '$14binary_add.6', 'z']
    $16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
    return $16return_value.7                 ['$16return_value.7']

-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
 $14binary_add.6 := float64,
 $16return_value.7 := float64,
 $6binary_add.2 := float64,
 arg.x := float64,
 arg.y := omitted(default=1.0),
 arg.z := omitted(default=1),
 x := float64,
 y := float64,
 z := Literal[int](1)]
-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
 $14binary_add.6 := float64,
 $16return_value.7 := float64,
 $6binary_add.2 := float64,
 arg.x := float64,
 arg.y := omitted(default=1.0),
 arg.z := omitted(default=1),
 x := float64,
 y := float64,
 z := Literal[int](1)]
---------------------------------Variable types---------------------------------
{'$10binary_add.4': float64,
 '$14binary_add.6': float64,
 '$16return_value.7': float64,
 '$6binary_add.2': float64,
 'arg.x': float64,
 'arg.y': omitted(default=1.0),
 'arg.z': omitted(default=1),
 'x': float64,
 'y': float64,
 'z': Literal[int](1)}
----------------------------------Return type-----------------------------------
float64
-----------------------------------Call types-----------------------------------
{$10binary_add.4 + z: (float64, float64) -> float64,
 $6binary_add.2 + y: (float64, float64) -> float64,
 x + x: (float64, float64) -> float64}
--------------------LLVM DUMP <function descriptor 'func$1'>--------------------
; ModuleID = "func$1"
target triple = "x86_64-unknown-linux-gnu"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

@"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
define i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %"retptr", {i8*, i32, i8*}** noalias nocapture %"excinfo", double %"arg.x") 
{
entry:
  %"x" = alloca double
  store double 0.0, double* %"x"
  %"y" = alloca double
  store double 0.0, double* %"y"
  %"z" = alloca i64
  store i64 0, i64* %"z"
  %"$6binary_add.2" = alloca double
  store double 0.0, double* %"$6binary_add.2"
  %"$10binary_add.4" = alloca double
  store double 0.0, double* %"$10binary_add.4"
  %"$14binary_add.6" = alloca double
  store double 0.0, double* %"$14binary_add.6"
  %"$16return_value.7" = alloca double
  store double 0.0, double* %"$16return_value.7"
  br label %"B0"
B0:
  %".6" = load double, double* %"x"
  store double %"arg.x", double* %"x"
  %".9" = load double, double* %"y"
  store double 0x3ff0000000000000, double* %"y"
  %".12" = load i64, i64* %"z"
  store i64 1, i64* %"z"
  %".14" = load double, double* %"x"
  %".15" = load double, double* %"x"
  %".16" = fadd double %".14", %".15"
  %".18" = load double, double* %"$6binary_add.2"
  store double %".16", double* %"$6binary_add.2"
  %".20" = load double, double* %"x"
  store double 0.0, double* %"x"
  %".22" = load double, double* %"$6binary_add.2"
  %".23" = load double, double* %"y"
  %".24" = fadd double %".22", %".23"
  %".26" = load double, double* %"$10binary_add.4"
  store double %".24", double* %"$10binary_add.4"
  %".28" = load double, double* %"y"
  store double 0.0, double* %"y"
  %".30" = load double, double* %"$6binary_add.2"
  store double 0.0, double* %"$6binary_add.2"
  %".32" = load double, double* %"$10binary_add.4"
  %".33" = load i64, i64* %"z"
  %".34" = sitofp i64 1 to double
  %".35" = fadd double %".32", %".34"
  %".37" = load double, double* %"$14binary_add.6"
  store double %".35", double* %"$14binary_add.6"
  %".39" = load i64, i64* %"z"
  store i64 0, i64* %"z"
  %".41" = load double, double* %"$10binary_add.4"
  store double 0.0, double* %"$10binary_add.4"
  %".43" = load double, double* %"$14binary_add.6"
  %".45" = load double, double* %"$16return_value.7"
  store double %".43", double* %"$16return_value.7"
  %".47" = load double, double* %"$14binary_add.6"
  store double 0.0, double* %"$14binary_add.6"
  %".49" = load double, double* %"$16return_value.7"
  store double %".49", double* %"retptr"
  ret i32 0
}

================================================================================
================================================================================
--------------------------FUNCTION OPTIMIZED DUMP func--------------------------
; ModuleID = 'func'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
@.const.func = internal constant [5 x i8] c"func\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [112 x i8] c"missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"
@_Py_NoneStruct = external global i8
@PyExc_StopIteration = external global i8
@PyExc_SystemError = external global i8
@".const.unknown error when calling native function" = internal constant [43 x i8] c"unknown error when calling native function\00"

define i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture %excinfo, double %arg.x) {
entry:
  %.16 = fadd double %arg.x, %arg.x
  %.24 = fadd double %.16, 1.000000e+00
  %.35 = fadd double %.24, 1.000000e+00
  store double %.35, double* %retptr
  ret i32 0
}

define i8* @"_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* %py_closure, i8* %py_args, i8* %py_kws) {
entry:
  %.5 = alloca i8*
  %.6 = alloca i8*
  %.7 = alloca i8*
  %.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.const.func, i32 0, i32 0), i64 3, i64 3, i8** %.5, i8** %.6, i8** %.7)
  %.9 = icmp eq i32 %.8, 0
  %.31 = alloca double
  store double 0.000000e+00, double* %.31
  %excinfo = alloca { i8*, i32, i8* }*
  store { i8*, i32, i8* }* null, { i8*, i32, i8* }** %excinfo
  br i1 %.9, label %entry.if, label %entry.endif, !prof !0

entry.if:                                         ; preds = %entry.endif.endif.endif.endif.endif, %entry.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.if.if, %entry.endif.endif, %entry.endif.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.endif.endif.endif, %entry
  ret i8* null

entry.endif:                                      ; preds = %entry
  %.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
  %.14 = ptrtoint i8* %.13 to i64
  %.15 = add i64 %.14, 16
  %.16 = inttoptr i64 %.15 to i8*
  %.18 = icmp eq i8* null, %.13
  br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0

entry.endif.if:                                   ; preds = %entry.endif
  call void @PyErr_SetString(i8* @PyExc_RuntimeError, i8* getelementptr inbounds ([112 x i8], [112 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i32 0, i32 0))
  ret i8* null

entry.endif.endif:                                ; preds = %entry.endif
  %.22 = load i8*, i8** %.5
  %.23 = call i8* @PyNumber_Float(i8* %.22)
  %.24 = call double @PyFloat_AsDouble(i8* %.23)
  call void @Py_DecRef(i8* %.23)
  %.26 = call i8* @PyErr_Occurred()
  %.27 = icmp ne i8* null, %.26
  br i1 %.27, label %entry.if, label %entry.endif.endif.endif, !prof !0

entry.endif.endif.endif:                          ; preds = %entry.endif.endif
  store double 0.000000e+00, double* %.31
  %.35 = call i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* %.31, { i8*, i32, i8* }** %excinfo, double %.24)
  %.36 = load { i8*, i32, i8* }*, { i8*, i32, i8* }** %excinfo
  %.37 = icmp eq i32 %.35, 0
  %.38 = icmp eq i32 %.35, -2
  %.41 = or i1 %.37, %.38
  %.43 = icmp sge i32 %.35, 1
  %.45 = load double, double* %.31
  switch i32 %.35, label %entry.endif.endif.endif.endif [
    i32 -2, label %entry.endif.endif.endif.if
    i32 0, label %entry.endif.endif.endif.if
  ]

entry.endif.endif.endif.if:                       ; preds = %entry.endif.endif.endif, %entry.endif.endif.endif
  br i1 %.38, label %entry.endif.endif.endif.if.if, label %entry.endif.endif.endif.if.endif

entry.endif.endif.endif.endif:                    ; preds = %entry.endif.endif.endif
  br i1 %.43, label %entry.endif.endif.endif.endif.if, label %entry.endif.endif.endif.endif.endif

entry.endif.endif.endif.if.if:                    ; preds = %entry.endif.endif.endif.if
  call void @Py_IncRef(i8* @_Py_NoneStruct)
  ret i8* @_Py_NoneStruct

entry.endif.endif.endif.if.endif:                 ; preds = %entry.endif.endif.endif.if
  %.50 = call i8* @PyFloat_FromDouble(double %.45)
  ret i8* %.50

entry.endif.endif.endif.endif.if:                 ; preds = %entry.endif.endif.endif.endif
  call void @PyErr_Clear()
  %.55 = load { i8*, i32, i8* }, { i8*, i32, i8* }* %.36
  %.56 = extractvalue { i8*, i32, i8* } %.55, 0
  %.58 = extractvalue { i8*, i32, i8* } %.55, 1
  %.60 = extractvalue { i8*, i32, i8* } %.55, 2
  %.61 = call i8* @numba_unpickle(i8* %.56, i32 %.58, i8* %.60)
  %.62 = icmp ne i8* null, %.61
  br i1 %.62, label %entry.endif.endif.endif.endif.if.if, label %entry.if, !prof !1

entry.endif.endif.endif.endif.endif:              ; preds = %entry.endif.endif.endif.endif
  switch i32 %.35, label %entry.endif.endif.endif.endif.endif.endif.endif [
    i32 -3, label %entry.endif.endif.endif.endif.endif.if
    i32 -1, label %entry.if
  ]

entry.endif.endif.endif.endif.if.if:              ; preds = %entry.endif.endif.endif.endif.if
  call void @numba_do_raise(i8* %.61)
  br label %entry.if

entry.endif.endif.endif.endif.endif.if:           ; preds = %entry.endif.endif.endif.endif.endif
  call void @PyErr_SetNone(i8* @PyExc_StopIteration)
  br label %entry.if

entry.endif.endif.endif.endif.endif.endif.endif:  ; preds = %entry.endif.endif.endif.endif.endif
  call void @PyErr_SetString(i8* @PyExc_SystemError, i8* getelementptr inbounds ([43 x i8], [43 x i8]* @".const.unknown error when calling native function", i32 0, i32 0))
  br label %entry.if
}

declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...)

declare void @PyErr_SetString(i8*, i8*)

declare i8* @PyNumber_Float(i8*)

declare double @PyFloat_AsDouble(i8*)

declare void @Py_DecRef(i8*)

declare i8* @PyErr_Occurred()

declare void @Py_IncRef(i8*)

declare i8* @PyFloat_FromDouble(double)

declare void @PyErr_Clear()

declare i8* @numba_unpickle(i8*, i32, i8*)

declare void @numba_do_raise(i8*)

declare void @PyErr_SetNone(i8*)

!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}

================================================================================
================================================================================
------------------------------OPTIMIZED DUMP func-------------------------------
; ModuleID = 'func'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common local_unnamed_addr global i8* null
@.const.func = internal constant [5 x i8] c"func\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [112 x i8] c"missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"

; Function Attrs: nofree norecurse nounwind writeonly
define i32 @"_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, double %arg.x) local_unnamed_addr #0 {
entry:
  %.16 = fadd double %arg.x, %arg.x
  %.24 = fadd double %.16, 1.000000e+00
  %.35 = fadd double %.24, 1.000000e+00
  store double %.35, double* %retptr, align 8
  ret i32 0
}

define i8* @"_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* nocapture readnone %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
  %.5 = alloca i8*, align 8
  %.6 = alloca i8*, align 8
  %.7 = alloca i8*, align 8
  %.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.const.func, i64 0, i64 0), i64 3, i64 3, i8** nonnull %.5, i8** nonnull %.6, i8** nonnull %.7)
  %.9 = icmp eq i32 %.8, 0
  br i1 %.9, label %entry.if, label %entry.endif, !prof !0

entry.if:                                         ; preds = %entry.endif.endif, %entry
  ret i8* null

entry.endif:                                      ; preds = %entry
  %.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", align 8
  %.18 = icmp eq i8* %.13, null
  br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0

entry.endif.if:                                   ; preds = %entry.endif
  call void @PyErr_SetString(i8* nonnull @PyExc_RuntimeError, i8* getelementptr inbounds ([112 x i8], [112 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i64 0, i64 0))
  ret i8* null

entry.endif.endif:                                ; preds = %entry.endif
  %.22 = load i8*, i8** %.5, align 8
  %.23 = call i8* @PyNumber_Float(i8* %.22)
  %.24 = call double @PyFloat_AsDouble(i8* %.23)
  call void @Py_DecRef(i8* %.23)
  %.26 = call i8* @PyErr_Occurred()
  %.27 = icmp eq i8* %.26, null
  br i1 %.27, label %entry.endif.endif.endif, label %entry.if, !prof !1

entry.endif.endif.endif:                          ; preds = %entry.endif.endif
  %.16.i = fadd double %.24, %.24
  %.24.i = fadd double %.16.i, 1.000000e+00
  %.35.i = fadd double %.24.i, 1.000000e+00
  %.50 = call i8* @PyFloat_FromDouble(double %.35.i)
  ret i8* %.50
}

declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...) local_unnamed_addr

declare void @PyErr_SetString(i8*, i8*) local_unnamed_addr

declare i8* @PyNumber_Float(i8*) local_unnamed_addr

declare double @PyFloat_AsDouble(i8*) local_unnamed_addr

declare void @Py_DecRef(i8*) local_unnamed_addr

declare i8* @PyErr_Occurred() local_unnamed_addr

declare i8* @PyFloat_FromDouble(double) local_unnamed_addr

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1

attributes #0 = { nofree norecurse nounwind writeonly }
attributes #1 = { nounwind }

!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}

================================================================================
================================================================================
---------------------------------ASSEMBLY func----------------------------------
	.text
	.file	"<string>"
	.section	.rodata.cst8,"aM",@progbits,8
	.p2align	3
.LCPI0_0:
	.quad	4607182418800017408
	.text
	.globl	_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
	.p2align	4, 0x90
	.type	_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
	vaddsd	%xmm0, %xmm0, %xmm0
	movabsq	$.LCPI0_0, %rax
	vmovsd	(%rax), %xmm1
	vaddsd	%xmm1, %xmm0, %xmm0
	vaddsd	%xmm1, %xmm0, %xmm0
	vmovsd	%xmm0, (%rdi)
	xorl	%eax, %eax
	retq
.Lfunc_end0:
	.size	_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end0-_ZN8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29

	.section	.rodata.cst8,"aM",@progbits,8
	.p2align	3
.LCPI1_0:
	.quad	4607182418800017408
	.text
	.globl	_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
	.p2align	4, 0x90
	.type	_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	subq	$48, %rsp
	.cfi_def_cfa_offset 64
	.cfi_offset %rbx, -16
	movq	%rsi, %rdi
	leaq	32(%rsp), %rax
	movq	%rax, (%rsp)
	movabsq	$.const.func, %rsi
	movabsq	$PyArg_UnpackTuple, %rbx
	leaq	24(%rsp), %r8
	leaq	40(%rsp), %r9
	movl	$3, %edx
	movl	$3, %ecx
	xorl	%eax, %eax
	callq	*%rbx
	testl	%eax, %eax
	je	.LBB1_1
	movabsq	$_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, %rax
	cmpq	$0, (%rax)
	je	.LBB1_4
	movq	24(%rsp), %rdi
	movabsq	$PyNumber_Float, %rax
	callq	*%rax
	movq	%rax, %rbx
	movabsq	$PyFloat_AsDouble, %rax
	movq	%rbx, %rdi
	callq	*%rax
	vmovsd	%xmm0, 16(%rsp)
	movabsq	$Py_DecRef, %rax
	movq	%rbx, %rdi
	callq	*%rax
	movabsq	$PyErr_Occurred, %rax
	callq	*%rax
	testq	%rax, %rax
	jne	.LBB1_1
	vmovsd	16(%rsp), %xmm0
	vaddsd	%xmm0, %xmm0, %xmm0
	movabsq	$.LCPI1_0, %rax
	vmovsd	(%rax), %xmm1
	vaddsd	%xmm1, %xmm0, %xmm0
	vaddsd	%xmm1, %xmm0, %xmm0
	movabsq	$PyFloat_FromDouble, %rax
	callq	*%rax
	addq	$48, %rsp
	.cfi_def_cfa_offset 16
	popq	%rbx
	.cfi_def_cfa_offset 8
	retq
.LBB1_4:
	.cfi_def_cfa_offset 64
	movabsq	$PyExc_RuntimeError, %rdi
	movabsq	$".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", %rsi
	movabsq	$PyErr_SetString, %rax
	callq	*%rax
.LBB1_1:
	xorl	%eax, %eax
	addq	$48, %rsp
	.cfi_def_cfa_offset 16
	popq	%rbx
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end1:
	.size	_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end1-_ZN7cpython8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
	.cfi_endproc

	.type	_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@object
	.comm	_ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,8,8
	.type	.const.func,@object
	.section	.rodata,"a",@progbits
.const.func:
	.asciz	"func"
	.size	.const.func, 5

	.type	".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29",@object
	.p2align	4
".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29":
	.asciz	"missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
	.size	".const.missing Environment: _ZN08NumbaEnv8__main__8func$241EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", 112

	.section	".note.GNU-stack","",@progbits

================================================================================
---------------------------------IR DUMP: func2---------------------------------
label 0:
    x = arg(0, name=x)                       ['x']
    y = arg(1, name=y)                       ['y']
    z = arg(2, name=z)                       ['z']
    $6binary_add.2 = x + x                   ['$6binary_add.2', 'x', 'x']
    $10binary_add.4 = $6binary_add.2 + y     ['$10binary_add.4', '$6binary_add.2', 'y']
    $14binary_add.6 = $10binary_add.4 + z    ['$10binary_add.4', '$14binary_add.6', 'z']
    $16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
    return $16return_value.7                 ['$16return_value.7']

---------------------------------IR DUMP: func2---------------------------------
label 0:
    x = arg(0, name=x)                       ['x']
    y = arg(1, name=y)                       ['y']
    z = arg(2, name=z)                       ['z']
    $6binary_add.2 = x + x                   ['$6binary_add.2', 'x', 'x']
    $10binary_add.4 = $6binary_add.2 + y     ['$10binary_add.4', '$6binary_add.2', 'y']
    $14binary_add.6 = $10binary_add.4 + z    ['$10binary_add.4', '$14binary_add.6', 'z']
    $16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
    return $16return_value.7                 ['$16return_value.7']

-------------------------------SSA IR DUMP: func2-------------------------------
label 0:
    x = arg(0, name=x)                       ['x']
    y = arg(1, name=y)                       ['y']
    z = arg(2, name=z)                       ['z']
    $6binary_add.2 = x + x                   ['$6binary_add.2', 'x', 'x']
    $10binary_add.4 = $6binary_add.2 + y     ['$10binary_add.4', '$6binary_add.2', 'y']
    $14binary_add.6 = $10binary_add.4 + z    ['$10binary_add.4', '$14binary_add.6', 'z']
    $16return_value.7 = cast(value=$14binary_add.6) ['$14binary_add.6', '$16return_value.7']
    return $16return_value.7                 ['$16return_value.7']

-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
 $14binary_add.6 := float64,
 $16return_value.7 := float64,
 $6binary_add.2 := float64,
 arg.x := float64,
 arg.y := omitted(default=1.0),
 arg.z := omitted(default=1),
 x := float64,
 y := float64,
 z := Literal[int](1)]
-----------------------------------propagate------------------------------------
---- type variables ----
[$10binary_add.4 := float64,
 $14binary_add.6 := float64,
 $16return_value.7 := float64,
 $6binary_add.2 := float64,
 arg.x := float64,
 arg.y := omitted(default=1.0),
 arg.z := omitted(default=1),
 x := float64,
 y := float64,
 z := Literal[int](1)]
---------------------------------Variable types---------------------------------
{'$10binary_add.4': float64,
 '$14binary_add.6': float64,
 '$16return_value.7': float64,
 '$6binary_add.2': float64,
 'arg.x': float64,
 'arg.y': omitted(default=1.0),
 'arg.z': omitted(default=1),
 'x': float64,
 'y': float64,
 'z': Literal[int](1)}
----------------------------------Return type-----------------------------------
float64
-----------------------------------Call types-----------------------------------
{$10binary_add.4 + z: (float64, float64) -> float64,
 $6binary_add.2 + y: (float64, float64) -> float64,
 x + x: (float64, float64) -> float64}
-------------------LLVM DUMP <function descriptor 'func2$2'>--------------------
; ModuleID = "func2$2"
target triple = "x86_64-unknown-linux-gnu"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

@"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
define i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %"retptr", {i8*, i32, i8*}** noalias nocapture %"excinfo", double %"arg.x") 
{
entry:
  %"x" = alloca double
  store double 0.0, double* %"x"
  %"y" = alloca double
  store double 0.0, double* %"y"
  %"z" = alloca i64
  store i64 0, i64* %"z"
  %"$6binary_add.2" = alloca double
  store double 0.0, double* %"$6binary_add.2"
  %"$10binary_add.4" = alloca double
  store double 0.0, double* %"$10binary_add.4"
  %"$14binary_add.6" = alloca double
  store double 0.0, double* %"$14binary_add.6"
  %"$16return_value.7" = alloca double
  store double 0.0, double* %"$16return_value.7"
  br label %"B0"
B0:
  %".6" = load double, double* %"x"
  store double %"arg.x", double* %"x"
  %".9" = load double, double* %"y"
  store double 0x3ff0000000000000, double* %"y"
  %".12" = load i64, i64* %"z"
  store i64 1, i64* %"z"
  %".14" = load double, double* %"x"
  %".15" = load double, double* %"x"
  %".16" = fadd double %".14", %".15"
  %".18" = load double, double* %"$6binary_add.2"
  store double %".16", double* %"$6binary_add.2"
  %".20" = load double, double* %"x"
  store double 0.0, double* %"x"
  %".22" = load double, double* %"$6binary_add.2"
  %".23" = load double, double* %"y"
  %".24" = fadd double %".22", %".23"
  %".26" = load double, double* %"$10binary_add.4"
  store double %".24", double* %"$10binary_add.4"
  %".28" = load double, double* %"y"
  store double 0.0, double* %"y"
  %".30" = load double, double* %"$6binary_add.2"
  store double 0.0, double* %"$6binary_add.2"
  %".32" = load double, double* %"$10binary_add.4"
  %".33" = load i64, i64* %"z"
  %".34" = sitofp i64 1 to double
  %".35" = fadd double %".32", %".34"
  %".37" = load double, double* %"$14binary_add.6"
  store double %".35", double* %"$14binary_add.6"
  %".39" = load i64, i64* %"z"
  store i64 0, i64* %"z"
  %".41" = load double, double* %"$10binary_add.4"
  store double 0.0, double* %"$10binary_add.4"
  %".43" = load double, double* %"$14binary_add.6"
  %".45" = load double, double* %"$16return_value.7"
  store double %".43", double* %"$16return_value.7"
  %".47" = load double, double* %"$14binary_add.6"
  store double 0.0, double* %"$14binary_add.6"
  %".49" = load double, double* %"$16return_value.7"
  store double %".49", double* %"retptr"
  ret i32 0
}

================================================================================
================================================================================
-------------------------FUNCTION OPTIMIZED DUMP func2--------------------------
; ModuleID = 'func2'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common global i8* null
@.const.func2 = internal constant [6 x i8] c"func2\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [113 x i8] c"missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"
@_Py_NoneStruct = external global i8
@PyExc_StopIteration = external global i8
@PyExc_SystemError = external global i8
@".const.unknown error when calling native function" = internal constant [43 x i8] c"unknown error when calling native function\00"

define i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture %excinfo, double %arg.x) {
entry:
  %.16 = fadd double %arg.x, %arg.x
  %.24 = fadd double %.16, 1.000000e+00
  %.35 = fadd double %.24, 1.000000e+00
  store double %.35, double* %retptr
  ret i32 0
}

define i8* @"_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* %py_closure, i8* %py_args, i8* %py_kws) {
entry:
  %.5 = alloca i8*
  %.6 = alloca i8*
  %.7 = alloca i8*
  %.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.const.func2, i32 0, i32 0), i64 3, i64 3, i8** %.5, i8** %.6, i8** %.7)
  %.9 = icmp eq i32 %.8, 0
  %.31 = alloca double
  store double 0.000000e+00, double* %.31
  %excinfo = alloca { i8*, i32, i8* }*
  store { i8*, i32, i8* }* null, { i8*, i32, i8* }** %excinfo
  br i1 %.9, label %entry.if, label %entry.endif, !prof !0

entry.if:                                         ; preds = %entry.endif.endif.endif.endif.endif, %entry.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.if.if, %entry.endif.endif, %entry.endif.endif.endif.endif.endif.if, %entry.endif.endif.endif.endif.endif.endif.endif, %entry
  ret i8* null

entry.endif:                                      ; preds = %entry
  %.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
  %.14 = ptrtoint i8* %.13 to i64
  %.15 = add i64 %.14, 16
  %.16 = inttoptr i64 %.15 to i8*
  %.18 = icmp eq i8* null, %.13
  br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0

entry.endif.if:                                   ; preds = %entry.endif
  call void @PyErr_SetString(i8* @PyExc_RuntimeError, i8* getelementptr inbounds ([113 x i8], [113 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i32 0, i32 0))
  ret i8* null

entry.endif.endif:                                ; preds = %entry.endif
  %.22 = load i8*, i8** %.5
  %.23 = call i8* @PyNumber_Float(i8* %.22)
  %.24 = call double @PyFloat_AsDouble(i8* %.23)
  call void @Py_DecRef(i8* %.23)
  %.26 = call i8* @PyErr_Occurred()
  %.27 = icmp ne i8* null, %.26
  br i1 %.27, label %entry.if, label %entry.endif.endif.endif, !prof !0

entry.endif.endif.endif:                          ; preds = %entry.endif.endif
  store double 0.000000e+00, double* %.31
  %.35 = call i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* %.31, { i8*, i32, i8* }** %excinfo, double %.24)
  %.36 = load { i8*, i32, i8* }*, { i8*, i32, i8* }** %excinfo
  %.37 = icmp eq i32 %.35, 0
  %.38 = icmp eq i32 %.35, -2
  %.41 = or i1 %.37, %.38
  %.43 = icmp sge i32 %.35, 1
  %.45 = load double, double* %.31
  switch i32 %.35, label %entry.endif.endif.endif.endif [
    i32 -2, label %entry.endif.endif.endif.if
    i32 0, label %entry.endif.endif.endif.if
  ]

entry.endif.endif.endif.if:                       ; preds = %entry.endif.endif.endif, %entry.endif.endif.endif
  br i1 %.38, label %entry.endif.endif.endif.if.if, label %entry.endif.endif.endif.if.endif

entry.endif.endif.endif.endif:                    ; preds = %entry.endif.endif.endif
  br i1 %.43, label %entry.endif.endif.endif.endif.if, label %entry.endif.endif.endif.endif.endif

entry.endif.endif.endif.if.if:                    ; preds = %entry.endif.endif.endif.if
  call void @Py_IncRef(i8* @_Py_NoneStruct)
  ret i8* @_Py_NoneStruct

entry.endif.endif.endif.if.endif:                 ; preds = %entry.endif.endif.endif.if
  %.50 = call i8* @PyFloat_FromDouble(double %.45)
  ret i8* %.50

entry.endif.endif.endif.endif.if:                 ; preds = %entry.endif.endif.endif.endif
  call void @PyErr_Clear()
  %.55 = load { i8*, i32, i8* }, { i8*, i32, i8* }* %.36
  %.56 = extractvalue { i8*, i32, i8* } %.55, 0
  %.58 = extractvalue { i8*, i32, i8* } %.55, 1
  %.60 = extractvalue { i8*, i32, i8* } %.55, 2
  %.61 = call i8* @numba_unpickle(i8* %.56, i32 %.58, i8* %.60)
  %.62 = icmp ne i8* null, %.61
  br i1 %.62, label %entry.endif.endif.endif.endif.if.if, label %entry.if, !prof !1

entry.endif.endif.endif.endif.endif:              ; preds = %entry.endif.endif.endif.endif
  switch i32 %.35, label %entry.endif.endif.endif.endif.endif.endif.endif [
    i32 -3, label %entry.endif.endif.endif.endif.endif.if
    i32 -1, label %entry.if
  ]

entry.endif.endif.endif.endif.if.if:              ; preds = %entry.endif.endif.endif.endif.if
  call void @numba_do_raise(i8* %.61)
  br label %entry.if

entry.endif.endif.endif.endif.endif.if:           ; preds = %entry.endif.endif.endif.endif.endif
  call void @PyErr_SetNone(i8* @PyExc_StopIteration)
  br label %entry.if

entry.endif.endif.endif.endif.endif.endif.endif:  ; preds = %entry.endif.endif.endif.endif.endif
  call void @PyErr_SetString(i8* @PyExc_SystemError, i8* getelementptr inbounds ([43 x i8], [43 x i8]* @".const.unknown error when calling native function", i32 0, i32 0))
  br label %entry.if
}

declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...)

declare void @PyErr_SetString(i8*, i8*)

declare i8* @PyNumber_Float(i8*)

declare double @PyFloat_AsDouble(i8*)

declare void @Py_DecRef(i8*)

declare i8* @PyErr_Occurred()

declare void @Py_IncRef(i8*)

declare i8* @PyFloat_FromDouble(double)

declare void @PyErr_Clear()

declare i8* @numba_unpickle(i8*, i32, i8*)

declare void @numba_do_raise(i8*)

declare void @PyErr_SetNone(i8*)

!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}

================================================================================
================================================================================
------------------------------OPTIMIZED DUMP func2------------------------------
; ModuleID = 'func2'
source_filename = "<string>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = common local_unnamed_addr global i8* null
@.const.func2 = internal constant [6 x i8] c"func2\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29" = internal constant [113 x i8] c"missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29\00"

; Function Attrs: nofree norecurse nounwind writeonly
define i32 @"_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(double* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, double %arg.x) local_unnamed_addr #0 {
entry:
  %.16 = fadd double %arg.x, %arg.x
  %.24 = fadd double %.16, 1.000000e+00
  %.35 = fadd double %.24, 1.000000e+00
  store double %.35, double* %retptr, align 8
  ret i32 0
}

define i8* @"_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"(i8* nocapture readnone %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
  %.5 = alloca i8*, align 8
  %.6 = alloca i8*, align 8
  %.7 = alloca i8*, align 8
  %.8 = call i32 (i8*, i8*, i64, i64, ...) @PyArg_UnpackTuple(i8* %py_args, i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.const.func2, i64 0, i64 0), i64 3, i64 3, i8** nonnull %.5, i8** nonnull %.6, i8** nonnull %.7)
  %.9 = icmp eq i32 %.8, 0
  br i1 %.9, label %entry.if, label %entry.endif, !prof !0

entry.if:                                         ; preds = %entry.endif.endif, %entry
  ret i8* null

entry.endif:                                      ; preds = %entry
  %.13 = load i8*, i8** @"_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", align 8
  %.18 = icmp eq i8* %.13, null
  br i1 %.18, label %entry.endif.if, label %entry.endif.endif, !prof !0

entry.endif.if:                                   ; preds = %entry.endif
  call void @PyErr_SetString(i8* nonnull @PyExc_RuntimeError, i8* getelementptr inbounds ([113 x i8], [113 x i8]* @".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", i64 0, i64 0))
  ret i8* null

entry.endif.endif:                                ; preds = %entry.endif
  %.22 = load i8*, i8** %.5, align 8
  %.23 = call i8* @PyNumber_Float(i8* %.22)
  %.24 = call double @PyFloat_AsDouble(i8* %.23)
  call void @Py_DecRef(i8* %.23)
  %.26 = call i8* @PyErr_Occurred()
  %.27 = icmp eq i8* %.26, null
  br i1 %.27, label %entry.endif.endif.endif, label %entry.if, !prof !1

entry.endif.endif.endif:                          ; preds = %entry.endif.endif
  %.16.i = fadd double %.24, %.24
  %.24.i = fadd double %.16.i, 1.000000e+00
  %.35.i = fadd double %.24.i, 1.000000e+00
  %.50 = call i8* @PyFloat_FromDouble(double %.35.i)
  ret i8* %.50
}

declare i32 @PyArg_UnpackTuple(i8*, i8*, i64, i64, ...) local_unnamed_addr

declare void @PyErr_SetString(i8*, i8*) local_unnamed_addr

declare i8* @PyNumber_Float(i8*) local_unnamed_addr

declare double @PyFloat_AsDouble(i8*) local_unnamed_addr

declare void @Py_DecRef(i8*) local_unnamed_addr

declare i8* @PyErr_Occurred() local_unnamed_addr

declare i8* @PyFloat_FromDouble(double) local_unnamed_addr

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1

attributes #0 = { nofree norecurse nounwind writeonly }
attributes #1 = { nounwind }

!0 = !{!"branch_weights", i32 1, i32 99}
!1 = !{!"branch_weights", i32 99, i32 1}

================================================================================
================================================================================
---------------------------------ASSEMBLY func2---------------------------------
	.text
	.file	"<string>"
	.section	.rodata.cst8,"aM",@progbits,8
	.p2align	3
.LCPI0_0:
	.quad	4607182418800017408
	.text
	.globl	_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
	.p2align	4, 0x90
	.type	_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
	vaddsd	%xmm0, %xmm0, %xmm0
	movabsq	$.LCPI0_0, %rax
	vmovsd	(%rax), %xmm1
	vaddsd	%xmm1, %xmm0, %xmm0
	vaddsd	%xmm1, %xmm0, %xmm0
	vmovsd	%xmm0, (%rdi)
	xorl	%eax, %eax
	retq
.Lfunc_end0:
	.size	_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end0-_ZN8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29

	.section	.rodata.cst8,"aM",@progbits,8
	.p2align	3
.LCPI1_0:
	.quad	4607182418800017408
	.text
	.globl	_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
	.p2align	4, 0x90
	.type	_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@function
_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	subq	$48, %rsp
	.cfi_def_cfa_offset 64
	.cfi_offset %rbx, -16
	movq	%rsi, %rdi
	leaq	32(%rsp), %rax
	movq	%rax, (%rsp)
	movabsq	$.const.func2, %rsi
	movabsq	$PyArg_UnpackTuple, %rbx
	leaq	24(%rsp), %r8
	leaq	40(%rsp), %r9
	movl	$3, %edx
	movl	$3, %ecx
	xorl	%eax, %eax
	callq	*%rbx
	testl	%eax, %eax
	je	.LBB1_1
	movabsq	$_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, %rax
	cmpq	$0, (%rax)
	je	.LBB1_4
	movq	24(%rsp), %rdi
	movabsq	$PyNumber_Float, %rax
	callq	*%rax
	movq	%rax, %rbx
	movabsq	$PyFloat_AsDouble, %rax
	movq	%rbx, %rdi
	callq	*%rax
	vmovsd	%xmm0, 16(%rsp)
	movabsq	$Py_DecRef, %rax
	movq	%rbx, %rdi
	callq	*%rax
	movabsq	$PyErr_Occurred, %rax
	callq	*%rax
	testq	%rax, %rax
	jne	.LBB1_1
	vmovsd	16(%rsp), %xmm0
	vaddsd	%xmm0, %xmm0, %xmm0
	movabsq	$.LCPI1_0, %rax
	vmovsd	(%rax), %xmm1
	vaddsd	%xmm1, %xmm0, %xmm0
	vaddsd	%xmm1, %xmm0, %xmm0
	movabsq	$PyFloat_FromDouble, %rax
	callq	*%rax
	addq	$48, %rsp
	.cfi_def_cfa_offset 16
	popq	%rbx
	.cfi_def_cfa_offset 8
	retq
.LBB1_4:
	.cfi_def_cfa_offset 64
	movabsq	$PyExc_RuntimeError, %rdi
	movabsq	$".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", %rsi
	movabsq	$PyErr_SetString, %rax
	callq	*%rax
.LBB1_1:
	xorl	%eax, %eax
	addq	$48, %rsp
	.cfi_def_cfa_offset 16
	popq	%rbx
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end1:
	.size	_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29, .Lfunc_end1-_ZN7cpython8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29
	.cfi_endproc

	.type	_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,@object
	.comm	_ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29,8,8
	.type	.const.func2,@object
	.section	.rodata,"a",@progbits
.const.func2:
	.asciz	"func2"
	.size	.const.func2, 6

	.type	".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29",@object
	.p2align	4
".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29":
	.asciz	"missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29"
	.size	".const.missing Environment: _ZN08NumbaEnv8__main__9func2$242EdN21omitted$28default$3d15_0$29E24omitted$28default$3d1$29", 113

	.section	".note.GNU-stack","",@progbits

================================================================================

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
CalebBellcommented, May 29, 2021

I looked again at this issue this morning, and made a little bit of progress. I had noticed #6957, and it turns out that this might be related. I tentatively changed the key method of the Omitted class. I took out “id(self._value)” from the line, and this issue went away.

I am sure this is an important line and shouldn’t be removed, but it was nice to make a little bit of progress. I can confirm this is still an issue. Sincerely, Caleb

0reactions
stuartarchibaldcommented, Aug 2, 2021

Looking at this again, I think this is to do with the dispatcher function cache being missed as the typecode of the Omitted values are different across invocations (hence changing the .key helps), though it’s not obvious what is going on.

In the above, the omitted float and int are (locally) 373 and 375 for func and the dispatcher cache bakes these in. When the %timeit runs on func the dispatcher matches the signature against the typecodes for the omitted float and int and it gets a perfect match and so just executes the function.

In the func2 invocation, the dispatcher “sees” a signature using the 373 and 375 typecodes, but but the computed typecodes of the omitted float and int are (locally) 1682 and something else respectively (the code jumps out of the loop at no match on the first), i.e. no match, so there’s a recompile, hence performance difference. What’s also “strange” is that it looks like each invocation of func2 gets a new set of computed typecodes for its arguments but for some reason the same doesn’t happen in func.

I’m reasonably convinced that the performance difference comes down to the dispatcher matching the typecodes exactly in the first run of func and then subsequently not matching ever again and the resulting recompilation is what’s causing the difference. This however is just a side effect of something else, it does not explain why the first is “typecode stable” and the second is not.

Demonstration:

# FIRST RUN IS FAST, DISPATCHER MATCHES ON TYPECODE
In [1]: import numba
   ...: 
   ...: @numba.njit
   ...: def func(x, y=1.0, z=1):
   ...:     return x+x + y + z
   ...: print(func(2.2))
   ...: 
   ...: %timeit func(2.2)
6.4
491 ns ± 0.859 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

# SECOND IDENTICAL RUN IS SLOW, DISPATCHER DOES NOT MATCH ON TYPECODE
In [2]: import numba
   ...: 
   ...: @numba.njit
   ...: def func(x, y=1.0, z=1):
   ...:     return x+x + y + z
   ...: print(func(2.2))
   ...: 
   ...: %timeit func(2.2)
6.4
92 µs ± 97.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# THIRD RUN WITH RENAMED FUNC IS SLOW, DISPATCHER DOES NOT MATCH ON TYPECODE
In [3]: import numba
   ...: 
   ...: @numba.njit
   ...: def RENAMED_func(x, y=1.0, z=1):
   ...:     return x+x + y + z
   ...: print(RENAMED_func(2.2))
   ...: 
   ...: %timeit RENAMED_func(2.2)
6.4
92.7 µs ± 86.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Outstanding questions:

  • How does the first invocation manage to get the same typecodes for its computed values for omitted args but the second invocation has them increasing (printing them out demonstrates this, perhaps printing out more/longer runs might see them change?!).
  • Why do the typecode numerical values move up by around 1000 values after the first run given it looks like it’s hitting the cache, and why only around 1000?!?
  • Does GC (as noted by @gmarkall) help explain some part of the above questions?
  • How to fix it!
Read more comments on GitHub >

github_iconTop Results From Across the Web

Calling functions with default parameters from interpreter is ...
Calling a compiled function from within the interpreter while omitting its default parameters is ~40 times slower than when the parameter is ...
Read more >
Why don't default parameters work alongside a ...
In a given function declaration, each parameter subsequent to a parameter with a default argument shall have a default argument supplied in this ......
Read more >
Default Arguments in C++
A default argument is a value provided in a function declaration that is automatically assigned by the compiler if the calling function ......
Read more >
Docs • Svelte
Complete documentation for Svelte.
Read more >
11 Tuning PL/SQL Applications for Performance
Functions that are called from PL/SQL queries, where the functions might be executed ... If you use OUT or IN OUT parameters, PL/SQL...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found