Bytecode generation for tailrec methods uses temporary variables which confuses debuggers
See original GitHub issueCompiler version
Scala 3.1.1
Minimized code
import scala.annotation.tailrec
object main {
@tailrec
def add(x: Int, y: Int): Int =
if (x == 0) y else add(x - 1, y + 1)
}
Scala 2.13.8 compiles the code as follows (only the add
method bytecode is shown, once in a more compact form, just the bytecodes, and then a verbose form with local variables as well).
// just the code
public int add(int, int);
Code:
0: iload_1
1: iconst_0
2: if_icmpne 9
5: iload_2
6: goto 20
9: iload_1
10: iconst_1
11: isub
12: iload_2
13: iconst_1
14: iadd
15: istore_2
16: istore_1
17: goto 0
20: ireturn
// same code, verbose with local variables
// access flags 0x1
public add(II)I
// parameter final x
// parameter final y
L0
LINENUMBER 5 L0
FRAME SAME
ILOAD 1
ICONST_0
IF_ICMPNE L1
ILOAD 2
GOTO L2
L1
FRAME SAME
ILOAD 1
ICONST_1
ISUB
ILOAD 2
ICONST_1
IADD
ISTORE 2
ISTORE 1
GOTO L0
L2
FRAME SAME1 I
IRETURN
L3
LOCALVARIABLE this Lmain$; L0 L3 0
LOCALVARIABLE x I L0 L3 1
LOCALVARIABLE y I L0 L3 2
MAXSTACK = 3
MAXLOCALS = 3
Please take a note that only 3 local variables are used to compile the method (a reference to this
, which is the outer object, irrelevant for the issue, and a local variable for each of the x
and y
function params.
Now let’s look at the bytecode generated by Scala 3.1.1, for the same exact code. Again, the code is shown twice, once in a more compact form, and once in a verbose form that shows the local variables.
// just the code
public int add(int, int);
Code:
0: iload_2
1: istore_3
2: iload_1
3: istore 4
5: iload 4
7: iconst_0
8: if_icmpne 15
11: iload_3
12: goto 36
15: iload 4
17: iconst_1
18: isub
19: istore 5
21: iload_3
22: iconst_1
23: iadd
24: istore 6
26: iload 5
28: istore 4
30: iload 6
32: istore_3
33: goto 37
36: ireturn
37: goto 5
40: athrow
41: athrow
// same code, verbose with local variables
// access flags 0x1
public add(II)I
// parameter final x
// parameter final y
L0
LINENUMBER 5 L0
ILOAD 2
ISTORE 3
ILOAD 1
ISTORE 4
L1
FRAME APPEND [I I]
ILOAD 4
ICONST_0
IF_ICMPNE L2
ILOAD 3
GOTO L3
L2
FRAME SAME
ILOAD 4
ICONST_1
ISUB
ISTORE 5
ILOAD 3
ICONST_1
IADD
ISTORE 6
ILOAD 5
ISTORE 4
ILOAD 6
ISTORE 3
GOTO L4
L3
FRAME SAME1 I
IRETURN
L4
FRAME APPEND [I I]
GOTO L1
L5
FRAME FULL [] [java/lang/Throwable]
ATHROW
L6
FRAME SAME1 java/lang/Throwable
ATHROW
L7
LOCALVARIABLE this Lmain$; L0 L7 0
LOCALVARIABLE x I L0 L7 1
LOCALVARIABLE y I L0 L7 2
MAXSTACK = 2
MAXLOCALS = 7
As you can see, the same code is compiled using 7 local variables, 3 named ones (the same as in Scala 2) and 4 unnamed ones. Let’s try to figure out a sort of mapping between them.
Right at the beginning of the function code, we have iload_2
and istore_3
, which moves the value of the third (0 based indexing) local variable (named y
) into an unnamed local variable with index 3.
Following that, we have iload_1
and istore 4
, again, moving the value of the first local variable (named x
) into an unnamed local variable with index 4. Thus, right at the beginning of the function, both function parameters were just copied over into new local variables.
Later down in the function, we can see that the results of x - 1
and y + 1
are stored again in these 2 unnamed variables (index 4 for x
and index 3 for y
), so, this means that effectively, the x
and y
function parameters are unused during the lifecycle of the function (they contain the original values that the function was called with), while the unnamed local variables with index 4 and index 3 serve as the effective stores of value for x
and y
during the recursive execution of the function.
Furthermore, we have a couple of istore 5
and istore 6
instructions which only transiently store the results of x - 1
and y + 1
, right before they are moved to the local variables 4 and 3, respectively, and the function body is executed again. Compared to Scala 2, this function uses 4 variables and ignores 2, to do a job that can be done with just the 2 function parameters as the stores of value.
As a real world effect of this situation, the IntelliJ IDEA debugger for Scala 3 (any version) does not work correctly with tail-recursive functions. The debugger shows the function parameters x
and y
as stack frame values, and they never change their values, which is supported by the bytecode above. Of course, other local values can be shown, but the experience is not great.
My question thus is, is this an intentional change in Scala 3 (I personally cannot see that being the case, because the code generation scheme produces much larger bytecode with larger stack requirements), or is this an ommission in Scala 3 due to some changes in the bytecode generation in Scala 2 not being ported over to dotty during development?
In any case, how should tooling builders approach this issue? Should we try to issue a fix in Scala 3, or try to work around it in the tooling? I don’t believe this is a problem for IntelliJ only, most debuggers would interpret the bytecode as it is produced.
Thanks in advance.
Issue Analytics
- State:
- Created a year ago
- Comments:11 (3 by maintainers)
Top GitHub Comments
PR available: #14865.
As @smarter said, tailrec functions are internally transformed in a much different way than in Scala 2. This is during tree transformations, several phases before emitting bytecode. Scala 2 had arbitrary labels and gotos, whereas Scala 3 only has “labeled blocks” (a form similar to
label: { statements; }
in JavaScript). Scala 2 also allowed its internal trees to set the values of immutable parameters in that case, which is something that Scala 3 does not allow anymore.This change of scheme has an impact on the JVM bytecode that is emitted, indeed. This was expected. I did not anticipate that it would impair debuggers, however (the semantics are the same, obviously).
It would be possible to alter the bytecode generation to reuse the slots of
x
andy
instead of the mutable temporaries that are used within the implementations (3 and 4 in your analysis). That would require some analysis to determine thatx
andy
are only used to initialize those temporaries, which may not be trivial to do, but doable. IIUC, that should be enough to improve the debugging experience.Regarding the extra temporaries used to change the values of the temporaries when doing the recursive call, that is a bit more complicated. In theory, a local bytecode optimizer could get rid of them, but this is more difficult to analyze. Unless fixing that would also be required to improve the debugging experience, I would not suggest investing time in that specific case.