Performance Optimizations (including Rust/AssemblyScript -> WASM)
See original GitHub issueThis document will be a work-in-progress while figuring out a game plan for how and where efforts can be best-focused for performance increases.
I have experimented with benchmarking the squash()
and sigmoid()
/sigmoidDerivative()
functions re-written in both Rust and AssemblyScript compiled to WASM.
Rust -> WASM showed very large (~x4.5) performance increases versus JS on Chrome with squash()
given 1,000,000 inputs. On Firefox, the performance was roughly the same. Still need to benchmark Rust WASM on Node.
AssemblyScript -> WASM was ~x2.5 as performant with squash()
given 1,000,000 inputs on Node. Still need to benchmark AssemblyScript WASM on browsers.
I will create a repo for the test functions and post the source code here, as well as upload screenshots from console output that have benchmark measurements soon.
@postspectacular gave some feedback and insane performance tuning on the basic implementation I had in AssemblyScript as well:
https://gist.github.com/postspectacular/3dccbfed1b753edadf1b6fee8add4808#file-03-sigmoid-simd-ts-L1
He dropped down to low-level memory/pointers, and even to SIMD instructions (yet from his comment, it seems as though there are limitations in WASM with V8 engine that dont support it quite yet). So there is obviously much room to improve when considering these benchmarks – I am not familiar with Rust, AssemblyScript, or even low-level memory concepts on the whole. Perhaps some other users might be able to give feedback or advice here?
The best way to approach this is probably:
- Profile the project running, and see which functions are most used. Apply the Pareto Principle/80-20 Rule, and it is likely that a small handful of methods are called the grand majority of time. These are where efforts should be focused.
- Create a way to benchmark these functions with mock data in an isolated environment, testing without this will be really difficult.
- Experiment with re-writing the most used functions in Rust/AssemblyScript. Benchmark, consider results.
- Look into using tooling from existing libraries to make things easier. Especially when it comes to math and matrix stuff.
- Read into using GPU.js and WebGL to represent array data as shaders. Umbrella has tooling for this too, I believe.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12
Top GitHub Comments
@GavinRay97 - no worries, i just felt like I needed to contextualize these snippets some more… all good!
@MaxGraey - thanks for reminding me of wasmstudio 😃 - I’ve uploaded some of my code from that gist there too and the results there are still similar to what i’ve found in node earlier (and contradictory to yours). But the only difference between the two benchmarked fn’s is their version of
exp()
:https://webassembly.studio/?f=kg1rea6qvbn
(@GavinRay97 et al - once the project sandbox has initialized, first open the browser console, then press “build & run”, timing info will be shown in console only… tests take ~12 sec altogether)
The tests are currently setup to work on normalized input data, executing 500 iterations on 1 million values each. The JS glue code is in
/src/main.js
. The AssemblyScript in/assembly/main.ts
.On my laptop (MBP2015, Chrome 78) I’m getting:
bench(“sigmoidApproxPtr”); // 811.406005859375ms (100 iters) // 4046.489013671875ms (500 iters)
bench(“sigmoidNatPtr”); // 1504.90380859375ms (100 iters) // 7569.575927734375ms (500 iters)
Getting consistent results after multiple runs… food for thought, I think! And please, this is NO upstaging attempt, whatsoever!
Also plz don’t use
Math.pow(x, 2)
. Currently we can’t optimize this tox * x
but it will be in future. You could absolutely safe replacex ** 2
tox * x
and this increase speed even more