Trade-offs in Control Flow Analysis
See original GitHub issueSome rough notes from a conversation @ahejlsberg and I had earlier about trade-offs in the control flow analysis work based on running the real-world code (RWC) tests. For obvious reasons I’ll be comparing what Flow does with similar examples to compare and contrast possible outcomes.
The primary question is: When a function is invoked, what should we assume its side effects are?
One option is to be pessimistic and reset all narrowings, assuming that any function might mutate any object it could possibly get its hands on. Another option is to be optimistic and assume the function doesn’t modify any state. Both of these seem to be bad.
This problem spans both locals (which might be subject to some “closed over or not” analysis) and object fields.
Optimistic: Bad behavior on locals
The TypeScript compiler has code like this:
enum Token { Alpha, Beta, Gamma }
let token = Token.Alpha;
function nextToken() {
token = Token.Beta;
}
function maybeNextToken() {
if (... something ...) {
nextToken();
}
}
function doSomething() {
if (token !== Token.Alpha) {
maybeNextToken();
}
// is this possible?
if (token === Token.Alpha) {
// something happens
}
}
Optimistically assuming token
isn’t modified by maybeNextToken
incorrectly flags token === Token.Alpha
as an impossibility. However, in other cases, this is a good check to do! See later examples.
Optimistic: Bad behavior on fields
The RWC suite picked up a “bug” that looked like this:
// Function somewhere else
declare function tryDoSomething(x: string, result: { success: boolean; value: number; }): void;
function myFunc(x: string) {
let result = { success: false, value: 0 };
tryDoSomething(x, result);
if (result.success === true) { // %%
return result.value;
}
tryDoSomething(x.trim(), result);
if (result.success === true) { // ??
return result.value;
}
return -1;
}
The ??
line here is not a bug in the user code, but we thought it was, because after the %%
block runs, the only remaining value in result.success
’s domain is false
.
Pessimistic: Bad behavior on locals
We found actual bugs (several!) in partner code that looked like this:
enum Kind { Good, Bad, Ugly }
let kind: Kind = ...;
function f() {
if (kind) {
log('Doing some work');
switch (kind) {
case Kind.Good:
// unreachable!
}
}
}
Here, we detected the bug that Kind.Good
(which has the falsy value 0
) is not in the domain of kind
at the point of the case
label. However, if we were fully pessimistic, we couldn’t know that the global function log
doesn’t modify the global variable kind
, thus incorrectly allowing this broken code.
Pessimistic: Bad behavior on fields, example 1
A question on flowtype SO is a good example of this
A smaller example that demonstrates the behavior:
function fn(arg: { x: string | null }) {
if (arg.x !== null) {
alert('All is OK!');
// Flow: Not OK, arg.x could be null
console.log(arg.x.substr(3));
}
}
The problem here is that, pessimistically, something like this might be happening:
let a = { x: 'ok' };
function alert() {
a.x = null;
}
fn(a);
Pessimistic: Bad behavior on fields, example 2
The TS compiler has code that looks like this (simplified):
function visitChildren(node: Node, visit: (node: Node) => void) {
switch(node.kind) {
case SyntaxKind.BinaryExpression:
visit(node.left);
visit(node.right); // Unsafe?
break;
case SyntaxKind.PropertyAccessExpression:
visit(node.expr);
visit(node.name); // Unsafe?
break;
}
}
Here, we discriminated the Node
union type by its kind
. A pessimistic behavior would say that the second invocations are unsafe, because the call to visit
may have mutated node.kind
through a secondary reference and invalidated the discrimination.
Mitigating with (shallow) inlining / analysis
Flow does some assignment analysis to improve the quality of these errors, but it’s obviously short of a full inlining solution, which wouldn’t be even remotely practical. Some examples of how to defeat the analysis:
// Non-null assignment can still trigger null warnings
function fn(x: string | null) {
function check1() {
x = 'still OK';
}
if (x !== null) {
check1();
// Flow: Error, x could be null
console.log(x.substr(0));
}
}
// Inlining is only one level deep
function fn(x: string | null) {
function check1() {
check2();
}
function check2() {
x = null;
}
if (x !== null) {
check1();
// Flow: No error
console.log(x.substr(0)); // crashes
}
}
Mitigating with const
parameters
A low-hanging piece of fruit is to allow a const
modifier on parameters. This would allow a much faster fix for code that looks like this:
function fn(const x: string | number) {
if (typeof x === 'string') {
thisFunctionCannotMutateX();
x.substr(0); // ok
}
}
Mitigating with readonly
fields
The visitChildren
example above might be mitigated by saying that readonly
fields retain their narrowing effects even in the presence of intervening function calls. This is technically unsound as you may have both a readonly
and non-readonly
alias to the same property, but in practice this is probably very rare.
Mitigating with other options
Random ideas that got thrown out (will add to this list) but are probably bad?
pure
modifier on functions that says this function doesn’t modify anything. This is a bit impractical as we’d realistically want this on the vast majority of all functions, and it doesn’t really solve the problem since lots of functions only modify one thing so you’d really want to say “pure
except form
”volatile
property modifier that says this “this property will change without notice”. We’re not C++ and it’s perhaps unclear where you’d apply this and where you wouldn’t.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:116
- Comments:76 (22 by maintainers)
Top GitHub Comments
Looking through the RWC tests, I see only one case where the additional narrowing performed by #9407 causes unwanted errors. However, there were scores of real bugs and inconsistencies being caught, such as checking for the same value twice, assuming that enum members with the value 0 pass truthiness checks, dead branches in
if
andswitch
statements, etc. In aggregate, I think our optimistic assumption that type guards are unaffected by intervening function calls is the best compromise.BTW, the compiler itself relies on side effecting changes to a
token
variable in the parser. For example:This becomes an error with #9407 because the compiler continues to think
token
has the valueSyntaxKind.ExportKeyword
following the call tonextToken
(and thus reports an error whentoken
is compared toSyntaxKind.DefaultKeyword
).We will instead be using a function to obtain the current token:
where the function is simply:
Since all modern JavaScript VMs inline such simple functions there is no performance penalty.
I think this pattern of suppressing type narrowing by accessing mutable state using a function is a reasonable one.
I see there’s few examples with synchronous side effects in functions here, but lots of linked issues around how this causes big problems with async/await, and no direct discussion of that that I can see, so it’s worth highlighting. Here’s an example of some async code where this issue causes problems:
(Playground link)
Note that the inference is wrong here despite explicit types of
x
. This issue makes it very awkward to write a function usingawait
to wait for a side effect, which I think is a fairly common case.Meanwhile in the equivalent promise-based code the types work totally fine: