Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve performance by compiling to webassembly

See original GitHub issue

I was able to prototype a hacked-together variant of the sucrase-babylon parser and get it working in AssemblyScript. I ran it through a benchmark of compiling a ~100 line file over and over and counting the number of tokens, and it gave the correct answer both when run from js and when run from wasm. Here were the results:

# iterations   JS time taken    wasm time taken
1              10.065ms         0.192ms
10             19.173ms         1.004ms
100            72.770ms         9.074ms
1000           182.896ms        72.428ms
10000          594.523ms        899.885ms
100000         4180.687ms       8830.265ms

As expected, the wasm running time grows linearly and the JS running time grows sub-linearly at least in small numbers, since V8 has more and more time to optimize the code and wasm optimizations are all at compile time. The wasm code is 50x faster on a very small case, about equal for about 50,000 lines of code, and 2x slower in the extremes. So it looks like, at least for now, V8 can do a really good job at optimization when given long enough, enough to outperform the equivalent code compiled through AssemblyScript.

Sadly, this means webassembly isn’t a slam dunk like I was hoping, but it still seems promising. Probably there’s room for improvement in the use of webassembly (tricks that would be impossible to implement in plain JS and unreasonable for V8 to infer), and AssemblyScript and WebAssembly both may improve over time, so there’s reason to believe webassembly will eventually win out for long-running programs after all. This test case may also be beneficial for V8 since it’s processing exactly the same dataset, so the branches and call statistics will be the same every time. This also only handles about half of the sucrase codebase, and the other half (determining and performing the string replacements) may run better in wasm compared with JS.

My plan now is to get Sucrase into a state where it can both run in JS and be compiled by AssemblyScript to webassembly. That should at least make it a good test bed for this stuff.

Another good thing to prototype would be porting the parser to Rust and/or C++ and see how it performs as native code vs wasm, and see how that wasm compares with the wasm produced by AssemblyScript.

Issue Analytics

State:
Created 5 years ago
Reactions:9
Comments:6 (3 by maintainers)

Top GitHub Comments

5reactions

alangpiercecommented, Nov 24, 2018

After working through more details to get the Sucrase parser working in full, I tried this again on a more realistic dataset and the results seem more promising now! It’s a TypeScript/JSX codebase that’s about 4000 files and about 550,000 lines, and the task was to run the Sucrase parser to count the total number of tokens in all files. I confirmed that the number of tokens was the same for js and wasm, so hopefully this means that both are indeed doing a full correct parse.

Here are the numbers:

Dataset                JS time   wasm time
First 100 files        195ms     29ms
First 1000 files       906ms     302ms
All files              1805ms    860ms
All files, 10 times    8620ms    8345ms
All files, 100 times   77250ms   81142ms

As expected, V8 runs JS better on larger datasets since it has more time to identify hot code paths and compile them with good optimizations. It looks like JS still does slightly better at the largest scale, around 50 million lines of code, but even then the difference is small. In my own use cases, the typical scale is 1000-4000 files, so at least here, I’d expect a 2-3x speedup. It’s unclear if the improvements are due to the more realistic dataset or improvements in AssemblyScript, but it looks like AssemblyScript will improve perf, especially on smaller datasets.

So this seems like a good enough justification to get the code fully working in AssemblyScript, including using i32 types instead of number and things like that. I’m still hoping to get the code into a state where the same codebase can run both, especially because debugging JS seems much nicer than debugging wasm.

2reactions

arvcommented, Nov 28, 2018

FWIW, we have successfully used sucrase with nodegun to get rid of the node start time and the JIT warmup time.