Reliable function inlining
See original GitHub issuescala-js: 0.6.27
I want to abstract over a piece of performance critical code. With the help of function inlining I could do everything I need. But sometimes some functions are not inlined. I don’t see a pattern when a function will be inlined and when not.
Simple example:
@inline def foo1(
bar: (() => Boolean) => Boolean,
):Boolean = {
println("sideeffect")
bar(() => false) && bar(() => false)
}
println("foo1" + foo1( bar = _()))
will be properly inlined by fastOptJS
:
var this$27 = $m_s_Console$();
var this$28 = $as_Ljava_io_PrintStream(this$27.outVar$2.v$1);
this$28.java$lang$JSConsoleBasedPrintStream$$printString__T__V("sideeffect\n");
var x = ("foo1" + false);
var this$30 = $m_s_Console$();
var this$31 = $as_Ljava_io_PrintStream(this$30.outVar$2.v$1);
this$31.java$lang$JSConsoleBasedPrintStream$$printString__T__V((x + "\n"));
But this example:
@inline def foo2(
bar: (() => Boolean) => Boolean,
):Boolean = {
println("sideeffect")
if(bar(() => false)) 5 else 7
bar(() => false)
}
println("foo2" + foo2( bar = _()))
Is not:
var f = (function(this$2$1) {
return (function(x$2$2) {
var x$2 = $as_F0(x$2$2);
return $uZ(x$2.apply__O())
})
})(this);
var this$33 = $m_s_Console$();
var this$34 = $as_Ljava_io_PrintStream(this$33.outVar$2.v$1);
this$34.java$lang$JSConsoleBasedPrintStream$$printString__T__V("sideeffect\n");
var arg1$3 = new $c_sjsr_AnonFunction0().init___sjs_js_Function0((function($this) {
return (function() {
return false
})
})(this));
$uZ(f(arg1$3));
var arg1$4 = new $c_sjsr_AnonFunction0().init___sjs_js_Function0((function(this$2$2) {
return (function() {
return false
})
})(this));
var x$1 = ("foo2" + $uZ(f(arg1$4)));
var this$36 = $m_s_Console$();
var this$37 = $as_Ljava_io_PrintStream(this$36.outVar$2.v$1);
this$37.java$lang$JSConsoleBasedPrintStream$$printString__T__V((x$1 + "\n"));
Why is inlining not working in the second example? It’s obviously not about the function signature, but somehow the usage of the arguments in the function body.
A more complex example, the one I’m actually working on. (I replaced custom array data structures with mutable.Stack
and mutable.HashMap
to reduce code size and make it a self-contained example):
@inline def depthFirstSearchGeneric[PROCESSRESULT](
vertexCount: Int,
foreachSuccessor: (Int, Int => Unit) => Unit, // (idx, f) => successors(idx).foreach(f)
init: (Int => Unit, collection.mutable.Stack[Int]) => Unit, // (enqueue,_) => enqueue(start)
processVertex: Int => PROCESSRESULT, // result += _
loopConditionGuard: (() => Boolean) => Boolean = condition => condition(),
advanceGuard: (PROCESSRESULT, () => Unit) => Unit =
(result: PROCESSRESULT, advance: () => Unit) => advance(),
enqueueGuard: (Int, () => Unit) => Unit = (elem, enqueue) => enqueue()
): Unit = {
val stack = new collection.mutable.Stack[Int] // ArrayStackInt.create(capacity = vertexCount)
val visited = new collection.mutable.HashSet[Int] // ArraySet.create(vertexCount)
@inline def enqueue(elem: Int): Unit = {
enqueueGuard(elem, { () =>
stack.push(elem)
visited += elem
})
}
init(enqueue, stack)
while (loopConditionGuard(() => !stack.isEmpty)) {
val current = stack.pop()
visited += current
advanceGuard(
processVertex(current),
() =>
foreachSuccessor(current, { next =>
if (!visited.contains(next)) {
enqueue(next)
}
})
)
}
}
val edges = Array(Array[Int](1), Array[Int](0))
depthFirstSearchGeneric(
edges.size,
(idx, f) => edges(idx).foreach(f),
init = (enqueue, _) => enqueue(0),
processVertex = v => println(v)
)
Which produces:
// val edges = Array(Array[Int](1), Array[Int](0))
var array = [1];
var xs = new $c_sjs_js_WrappedArray().init___sjs_js_Array(array);
var len = $uI(xs.array$6.length);
var array$1 = $newArrayObject($d_I.getArrayOf(), [len]);
var elem$1 = 0;
elem$1 = 0;
var this$13 = new $c_sc_IndexedSeqLike$Elements().init___sc_IndexedSeqLike__I__I(xs, 0, $uI(xs.array$6.length));
while (this$13.hasNext__Z()) {
var arg1 = this$13.next__O();
array$1.set(elem$1, $uI(arg1));
elem$1 = ((1 + elem$1) | 0)
};
var array$2 = [0];
var xs$1 = new $c_sjs_js_WrappedArray().init___sjs_js_Array(array$2);
var len$1 = $uI(xs$1.array$6.length);
var array$3 = $newArrayObject($d_I.getArrayOf(), [len$1]);
var elem$1$1 = 0;
elem$1$1 = 0;
var this$20 = new $c_sc_IndexedSeqLike$Elements().init___sc_IndexedSeqLike__I__I(xs$1, 0, $uI(xs$1.array$6.length));
while (this$20.hasNext__Z()) {
var arg1$1 = this$20.next__O();
array$3.set(elem$1$1, $uI(arg1$1));
elem$1$1 = ((1 + elem$1$1) | 0)
};
var array$4 = [array$1, array$3];
var xs$2 = new $c_sjs_js_WrappedArray().init___sjs_js_Array(array$4);
var len$2 = $uI(xs$2.array$6.length);
var array$5 = $newArrayObject($d_I.getArrayOf().getArrayOf(), [len$2]);
var elem$1$2 = 0;
elem$1$2 = 0;
var this$28 = new $c_sc_IndexedSeqLike$Elements().init___sc_IndexedSeqLike__I__I(xs$2, 0, $uI(xs$2.array$6.length));
while (this$28.hasNext__Z()) {
var arg1$2 = this$28.next__O();
array$5.set(elem$1$2, arg1$2);
elem$1$2 = ((1 + elem$1$2) | 0)
};
// the four function arguments, that are not inlined.
// (vertexCount, init and processVertex are inlined)
var foreachSuccessor = new $c_sjsr_AnonFunction2().init___sjs_js_Function2((function($this, edges) {
return (function(idx$2, f$2) {
var idx = $uI(idx$2);
var f = $as_F1(f$2);
var xs$3 = edges.get(idx);
var i = 0;
var len$3 = xs$3.u.length;
while ((i < len$3)) {
var idx$1 = i;
f.apply__O__O(xs$3.get(idx$1));
i = ((1 + i) | 0)
}
})
})(this, array$5));
var loopConditionGuard = this.depthFirstSearchGeneric$default$5$1__p1__F1();
var advanceGuard = this.depthFirstSearchGeneric$default$6$1__p1__F2();
var enqueueGuard = this.depthFirstSearchGeneric$default$7$1__p1__F2();
// function body
var stack = new $c_scm_Stack().init___();
var visited = new $c_scm_HashSet().init___();
enqueueGuard.apply__O__O__O(0, new $c_sjsr_AnonFunction0().init___sjs_js_Function0((function($this$1, stack$1, elem, visited$1) {
return (function() {
stack$1.push__O__scm_Stack(elem);
visited$1.$$plus$eq__O__scm_HashSet(elem)
})
})(this, stack, 0, visited)));
while ($uZ(loopConditionGuard.apply__O__O(new $c_sjsr_AnonFunction0().init___sjs_js_Function0((function(this$2$1, stack$2) {
return (function() {
return (!stack$2.elems$5.isEmpty__Z())
})
})(this, stack))))) {
var current = $uI(stack.pop__O());
visited.$$plus$eq__O__scm_HashSet(current);
var this$35 = $m_s_Console$();
var this$36 = $as_Ljava_io_PrintStream(this$35.outVar$2.v$1);
this$36.java$lang$JSConsoleBasedPrintStream$$printString__T__V((current + "\n"));
advanceGuard.apply__O__O__O((void 0), new $c_sjsr_AnonFunction0().init___sjs_js_Function0((function(this$3$1, foreachSuccessor$1, current$1, visited$2, enqueueGuard$1, stack$3) {
return (function() {
foreachSuccessor$1.apply__O__O__O(current$1, new $c_sjsr_AnonFunction1().init___sjs_js_Function1((function($this$2, visited$1$1, enqueueGuard$1$1, stack$1$1) {
return (function(next$2) {
var next = $uI(next$2);
if ((!$f_scm_FlatHashTable__containsElem__O__Z(visited$1$1, next))) {
enqueueGuard$1$1.apply__O__O__O(next, new $c_sjsr_AnonFunction0().init___sjs_js_Function0((function($this$3, stack$1$2, elem$2, visited$1$2) {
return (function() {
stack$1$2.push__O__scm_Stack(elem$2);
visited$1$2.$$plus$eq__O__scm_HashSet(elem$2)
})
})($this$2, stack$1$1, next, visited$1$1)))
}
})
})(this$3$1, visited$2, enqueueGuard$1, stack$3)))
})
})(this, foreachSuccessor, current, visited, enqueueGuard, stack)))
};
First, I was expecting everything to be inlined. Then I thought that maybe the Closure Compiler or JIT would optimize these cases, but I checked the generated code and benchmarked: They don’t optimize how I want them to. There is a measurable performance hit when the functions are not inlined.
So my alternative solutions are:
- Hard-code all needed variations by hand. This is where I’m coming from. I had like ~15 variations of this function and made lots of mistakes. It was very unmaintainable.
- Write a macro, which I would like to avoid.
Are there any other options I didn’t think of? Is it possible to tweak my function, so that all arguments are inlined?
Sorry for this very long issue and thanks for your help! 😅
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top GitHub Comments
I managed to get everything inlined the way you want with the following variant. But it’s awkward because I had to duplicate the
enqueueGuard
param, and two lambdas doing the same thing would need to be passed as actual arguments if you want to use something else than the default at call site:I’m going to close this, as I don’t think there’s anything actionable for this repo.