WebAssembly gives us the promise to run high performance code in the browser in a standardized way. Now that there are a few WebAssembly previews available I decided it’s time to take a look at their performance. One source for benchmarks is the well known Computer Language Benchmarks Game and I decided to pick nbody (it’s almost four years ago since I did so last time…).
After playing a bit with the results I decided to put the code on github. I’m looking forward to your corrections, improvements and feedback. I’m already excited what the results will look like in a few months…
The following versions were compared:
- webAssembly: A WebAssembly version compiled from the original c version, because this turned out to be faster than the other version I checked
- object: The fastest javascript version from the Computer Language Benchmarks Game. It uses a javascript objects for each body to store the data.
- arrayPerObject: Each body’s data is stored in a plain javascript array.
- floatArrayPerObject: Each body’s data is stored in a typed array
- oneTypedArray: All body’s data is stored in a single typed array and the advance function is programmatically unrolled (quite crazy, isn’t it).
- To get a baseline the fastest java and the original c version were added.
Here are the results:
Firefox does pretty well. The WebAssembly implementation is the fastest browser version and close to the java baseline, but the pure javascript implementation isn’t really much behind. Seems like Javascript VMs are already pretty good at simple numeric code.
For the other browsers WebAssembly couldn’t beat the javascript versions yet. And Safari has a completely different idea what Javascript version it can optimize best.
The fine print
Browser versions:
- Chrome Canary, 58.0.3004.0, invoked with
--js-flags="--turbo --trace-opt --trace-deopt --trace-bailout"
for turbofan and
--js-flags="--trace-opt --trace-deopt --trace-bailout"
for crankshaft.
- Firefox 53.0a2 (2017-02-06) (64-Bit)
- Safari Technology Preview Release 22 (Safari 10.2, WebKit 12604.1.4.2)
WebAssembly setup:
- Emscripten and Binaryen were installed as described on Compile Emscripten from Source. (emcc (Emscripten gcc/clang-like replacement) 1.37.2 (commit 70d370296036cc5046083a3e215cb605c90e004b))
- The c source code was compiled with that command:
emcc nbody.c -O3 -s WASM=1 -s SIDE_MODULE=1 -o nbody.wasm
C compiler:
- The c version was compiled with gcc -O3 nbody.c -o nbody (which is Apple LLVM version 8.0.0 (clang-800.0.42.1))
This version took 4.4 seconds on my machine and was faster than the fastest C version from the shootout, compiled with gcc -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 nbody_fastest.c -o nbody_fastest, which took 4.9 seconds on my machine
Infrastructure:
All tests were performed on a 2015 MacBook Pro, 2.5 GHz Intel Core i7, 16 GB 1600 MHz DDR3. For all tests the best of three runs was selected for the result.
Nice work! Do you have any data about the variance between runs? It’s not clear from the chart how significant the differences are.
I collected three runs for each browser and version and took the best run. Data varies with each run of course, but I believe (without statistical proof) that the best run rounded to one decimal is acceptable.
Here’s an example for the WebAssembly data (data in “german” milliseconds):
webAssembly
safari 8.481,9 8.406,60 8.477,90
firefox 5.860,16 6.141,81 6.088,67
chrome (turbofan) 8.015,82 8.062,62 7.809,40
chrome (crankshaft) 7.806,36 7.997,36 8.012,83
Nice benchmark! Would be interesting to see also the memory usage.
Can you test also Microsoft Edge included with Insider >=15019 which has also Webassembly?
I would strongly recommend doing several more than three runs and reporting the average and standard deviation for each version. Taking only the best results is not really representative of real world performance.
If possible, I recommend not using pre-release browser engines for performance benchmarks. I don’t know specifically about WebAsm or about Firefox, but Safari Tech Preview and Chrome Canary are generally not as fast as their production-release counterparts.
I’m afraid I don’t have a suitable windows on my mac. But I’d also love to know how well edge is doing.
I second Fancher’s suggestion. Additionally, you want to detail how many iterations per run, which helps to isolate the cache performance properties inherent within a benchmark. Many benchmark platforms use thousands of iterations per run to do this.
How asm.js is doing on the same setup ?
It would be interesting to know i think.
Thanks for putting together this benchmark! I ran it on my standalone WebAssembly VM (https://github.com/AndrewScheidecker/WAVM):
9488ms – native (VC 2015)
9669ms – WAVM
10014ms – Firefox 54.0a1
WAVM uses LLVM to generate code, so it’s not too surprising that it gets close to native performance. There are still some inefficiencies in how WAVM maps WebAssembly into LLVM IR, but overall it will generate code closer to an offline compiler than a browser JIT. The remaining difference should just be sandboxing overhead. Hopefully browsers can eventually match the performance of the code generated by WAVM without the expensive LLVM codegen!
– I agree, 3 runs is not enough.
– What about the compile time and the size? I quote: “WebAssembly or wasm is a new portable, size- and load-time-efficient format suitable for compilation to the web.”…. those are the reasons why wasm exists. Can you provide your experience on that end?
Those tests are not considering the time of parsing and compiling for javascript, which might be the real performance issue for modern web apps.
[JavaScript Start-up Performance](https://medium.com/@addyosmani/javascript-start-up-performance-69200f43b201#.ynbzkmypo)
If you want to estimate the standard deviation from just 3 runs, you can try using the unbiased standard deviation: https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation
Essentially that means scaling it by a correction factor which takes into account that you’re heavily under-sampling the distribution (for 3 values: E ~ 1 / 0.886 * std → σ = 1.129 * √(1/2 * Σ(xi – xmean)²) ).
For your firefox example that would be 0.168,85
(from firefox 5.860,16 6.141,81 6.088,67)
If you want to know the number of runs to do, calculate that unbiased standard deviation and increase the number of runs until interesting values differ by at least 2x the largest unbiased standard deviation of the values.