Consider syntax with significant indentation
See original GitHub issueI was playing for a while now with ways to make Scala’s syntax indentation-based. I always admired the neatness of Python syntax and also found that F# has benefited greatly from its optional indentation-based syntax, so much so that nobody seems to use the original syntax anymore. I had some good conversations with @lihaoyi at Scala Exchange in 2015 about this. At the time, there were some issues with which I was not happy yet, notably how to elide braces of arguments to user-defined functions. I now have a proposal that addresses these issues.
Proposal in a Nutshell
-
If certain keywords are followed by an end-of-line and an indented code block, assume block structure as if braces were inserted around the indented block. Example:
def f(x: Int) = val y = x * x y + 1
is treated as equivalent to
def f(x: Int) = { val y = x * x y + 1 }
Or, for match expressions:
xs match case x :: xs1 => ... case Nil => ...
Or, using the new syntax for if-then-else:
if condition then println("taken") x else println("not taken") y
Or for for-expressions:
for x <- xs y <- ys yield f(x, y)
-
Use
with
+ indentation as an alternative way to delimit statement sequences in objects and classes. Example:object Obj with class C(x: Int) with def f = x + 3 def apply(x: Int) = new C(x)
-
Also use
with
+ indentation as an alternative way to pass arguments that were formerly in braces to functions. Examples:xs.map with x => x + 2 xs.collect with case P1 => E1 case P2 => E2
Motivation
Why use indentation-based syntax?
-
Cleaner typography: We are all used to write
def f() = { ... }
But if we look at it with fresh eyes it’s really quite weird how the braces embrace nothing but empty space. Geometrically the braces point away from the enclosed space
...
. One could argue that other brace schemes are better. But there are good reasons why the scheme above is the most popular and in a sense arguments how to make braces look less awkward are themselves an indication that braces are fundamentally problematic. It’s much simpler and cleaner to get rid of them:def f() = ...
-
Regain control of vertical white space. Most of us are very particular how to organize horizontal whitespace, with strict rules for indentation and alignment. We are much less demanding on vertical whitespace. With braces, we cannot be, because vertical whitespace is fully determined by the number of closing braces. So two definitions might be separated by a single blank line, or by many (almost) blank lines if there are closing braces. One could avoid this by putting several closing braces on one line, but this looks weird and therefore has not caught on.
-
Ease of learning. There are some powerful arguments why indentation based syntax is easier to learn.
-
Less prone to errors. Braces are a weak signal, much weaker than indentation. So when brace structure and indentation differ, we misunderstand what was written. The code below exhibits a common problem:
if (condition) println("something") action()
Indentation fools us to believe that
action
is executed only ifcondition
is true. But of course that’s not the case, because we forgot to add braces. -
Easier to change. A situation like the one above happens particularly often when one adds the first
println
statement after the fact. To protect against modification problems like this, some people suggest to always write braces even if they only enclose a single statement or expression. But that sort of boilerplatey defensive programming is generally not considered good practice in Scala. -
Better vertical alignment. In the most commonly used brace scheme, an if-then-else reads like this:
if (cond1) { ... } else if (cond2) { ... } else { ... }
Instead of nicely aligned keywords, we find weird looking closing braces
}
at the beginning of the most important lines in this code. We are all used to this, but that does not make it good. I have recently changed my preferred style to:if (cond1) { ... } else if (cond2) { ... } else { ... }
This solves the alignment issue: The if
and the else
s are now vertically aligned. But it gives up even more control over vertical whitespace.
Impediments
What are the reasons for preferring braces over indentations?
-
Provide visual clues where constructs end. With pure indentation based syntax it is sometimes hard to tell how many levels of indentation are terminated at some point. In the code below, it is still easy to see that
j
is on the same nesting level asg
. But if there were many more lines between the definitions, it might not be.def f = def g = def h = def i = 1 i def j = 2
The proposal includes with end comments a way to mitigate this issue.
-
Susceptibility to off-by-one indentation. It’s easy to make a mistake so that indentation is off by a space to the left or the right. Without proper guards, indentation based syntax would interprete misalignment as nesting which can lead to errors that are very hard to diagnose.
In the proposal, an indented block is always preceded by a keyword that must be followed by a nested expression or statement sequence. So it is impossible to accidentally introduce nesting by veering off to the right. I tried to experiment with the following additional rule, which would make it unlikely to accidentally terminate nesting by veering off to the left:
- When terminating an indented block by a new statement that starts further to the left than the block, it is checked that the new statement aligns exactly with previous statements at the same indentation level.
That rule proves to be quite constraining (for instance it would outlaw the chained
filter
andmap
operations in the example below), so it is currently not implemented. -
Editor support. Editors typically have ways to highlight matching pairs of braces. Without that support, it becomes harder to understand the nesting structure of a program. On the other hand, it’s also straightforward to provide navigation help for indentation-based syntax, for instance by providing a command to go to the start of the previous or next definition at the indentation level of the cursor. According to @lihaoyi’s comment, major editors do something like this already.
But neither of these points are the strongest argument against indentation. The strongest argument is clearly
- Cost of change. It would be expensive in many dimensions to change to indentation based syntax. To be sure, the present proposal for indentation based syntax still allows braces, so existing programs would still compile. But there are other costs as well. If the new indentation syntax is not universally adopted, we incur the cost that there will be two visually distinct ways to structure Scala code. People used to one way will be less comfortable reading the other. If the new indentation syntax does take over as a universal standard (which I would expect), we have rendered outmoded all blogs, books, StackOverflow answers and other technical information that used the old syntax. It will be a long time to change all that, and the transition will be awkward.
Proposal in Detail
Expanded use of with
While we are about to phase out with
as a connective for types, we propose to add it in two new roles for definitions and terms. For definitions, we allow with
as an optional prefix of (so far brace-enclosed) statement sequences in templates, packages, and enums. For terms, we allow with
as another way to express function application. f with { e }
is the same as f{e}
. This second rule looks redundant at first, but will become important once significant indentation is added. The proposed syntax changes are as described in this diff.
Significant Indentation
In code outside of braces, parentheses or brackets we maintain a stack of indentation levels. At the start of the program, the stack consists of the indentation level zero.
If a line ends in one of the keywords =
, if
, then
, else
, match
, for
, yield
, while
, do
, try
, catch
, finally
or with
, and the next token starts in a column greater than the topmost indentation level of the stack, an open brace {
is implicitly inserted and the starting column of the token is pushed as new top entry on the stack.
If a line starts in a column smaller than the current topmost indentation level, it is checked that there is an entry in the stack whose indentation level precisely matches the start column. The stack is popped until that entry is at the top and for each popped entry a closing brace }
is implicitly inserted. If there is no entry in the stack whose indentation level precisely matches the start column an error is issued.
None of these steps is taken in code that is enclosed in braces, parentheses or brackets.
Lambdas with with
A special convention allows the common layout of lambda arguments without braces, as in:
xs.map with x =>
...
The rule is as follows: If a line contains an occurrence of the with
keyword, and that same line ends in a =>
and is followed by an indented block, and neither the with
nor the =>
is enclosed by braces, parentheses or brackets, an open brace {
is assumed directly following the with
and a matching closing brace is assumed at the end of the indented block.
If there are several occurrences of with
on the same line that match the condition above, the last one is chosen as the start of the indented block.
Interpreted End-Comments
If a statement follows a long indented code block, it is sometimes difficult as a writer to ensure that the statement is correctly indented, or as a reader to find out to what indentation level the new statement belongs. Braces help because they show that something ends here, even though they do not say by themselves what. We can improve code understanding by adding comments when a long definition ends, as in the following code:
def f =
def g =
...
(long code sequence)
...
// end f
def h
The proposal is to make comments like this one more useful by checking that the indentation of the // end
comment matches the indentation of the structure it refers to. In case of discrepancy, the compiler should issue a warning like:
// end f
~~~~~~
misaligned // end, corresponds to nothing
More precisely, let an “end-comment” be a line comment of the form
// end <id>
where <id>
is a consecutive sequence of identifier and/or operator characters and <id>
either ends the comment or is followed by a punctuation character .
, ;
, or ,
. If <id>
is one of the strings def
, val
, type
, class
, object
, enum
, package
, if
, match
, try
, while
, do
, or for
, the compiler checks that the comment is immediately preceded by a syntactic construct described by a keyword matching <id>
and starting in the same column as the end comment. If <id>
is an identifier or operator name, the compiler checks that the comment is immediately preceded by a definition of that identifier or operator that starts in the same column as the end comment. If a check fails, a warning is issued.
Implementation
The proposal has been implemented in #2488. The implementation is quite similar to the way optional semicolons are supported. The bulk of the implementation can be done in the lexical analyzer, looking only at the current token and line indentation. The rule for “lambdas with with
” requires some lookahead in the lexical analyzer to check the status at the end of the current line. The parser needs to be modified in a straightforward way to support the new syntax with the
generalized use of with
.
Example
Here’s some example code, which has been compiled with the implementation in #2488.
object Test with
val xs = List(1, 2, 3)
// Plain indentation
xs.map with
x => x + 2
.filter with
x => x % 2 == 0
.foldLeft(0) with
_ + _
// Using lambdas with `with`
xs.map with x =>
x + 2
.filter with x =>
x % 2 == 0
.foldLeft(0) with
_ + _
// for expressions
for
x <- List(1, 2, 3)
y <- List(x + 1)
yield
x + y
for
x <- List(1, 2, 3)
y <- List(x + 1)
do
println(x + y)
// Try expressions
try
val x = 3
1.0 / x
catch
case ex: Exception =>
0
finally
println("done")
// Match expressions
xs match
case Nil =>
println()
0
case x :: Nil =>
1
case _ => 2
// While and Do
do
println("x")
println("y")
while
println("z")
true
while
println("z")
true
do
println("x")
println("y")
// end while
// end Test
package p with
object o with
class C extends Object
with Serializable with
val x = new C with
def y = 3
val result =
if x == x then
println("yes")
true
else
println("no")
false
// end C
// end o
Issue Analytics
- State:
- Created 6 years ago
- Reactions:664
- Comments:95 (33 by maintainers)
Top GitHub Comments
This offers no objective improvement to the language at a cost that is not insignificant. More overloaded keywords is the last thing that helps newbies and supporting two styles or switching between them is burdensome. Developers in general will not reach a consensus on indentation vs delimiters any more than they will on tabs vs spaces or which line your curly brackets go on. Please don’t facilitate wasting effort debating this (or having to switch) in every project and leave it as it is.
Wow, this proposal has generated a lot of heat (should have expected that!) I think for now my proposed strategy will be:
have this or a variant of it as an optional feature in early versions of Dotty, controlled by a command-line flag.
get automatic reformatters that can switch between braces and indentation.
experiment with the feature and get feedback from users.
Once the experiments are in, decide on whether we want to keep this.