WebAssembly gives us the promise to run high performance code in the browser in a standardized way. Now that there are a few WebAssembly previews available I decided it’s time to take a look at their performance. One source for benchmarks is the well known Computer Language Benchmarks Game and I decided to pick nbody (it’s almost four years ago since I did so last time…).
After playing a bit with the results I decided to put the code on github. I’m looking forward to your corrections, improvements and feedback. I’m already excited what the results will look like in a few months…
The following versions were compared:
- webAssembly: A WebAssembly version compiled from the original c version, because this turned out to be faster than the other version I checked
- floatArrayPerObject: Each body’s data is stored in a typed array
- oneTypedArray: All body’s data is stored in a single typed array and the advance function is programmatically unrolled (quite crazy, isn’t it).
- To get a baseline the fastest java and the original c version were added.
Here are the results:
(click to enlarge)
The fine print
- Emscripten and Binaryen were installed as described on Compile Emscripten from Source. (emcc (Emscripten gcc/clang-like replacement) 1.37.2 (commit 70d370296036cc5046083a3e215cb605c90e004b))
- The c source code was compiled with that command:
emcc nbody.c -O3 -s WASM=1 -s SIDE_MODULE=1 -o nbody.wasm
- The c version was compiled with gcc -O3 nbody.c -o nbody (which is Apple LLVM version 8.0.0 (clang-800.0.42.1))
This version took 4.4 seconds on my machine and was faster than the fastest C version from the shootout, compiled with gcc -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 nbody_fastest.c -o nbody_fastest, which took 4.9 seconds on my machine
All tests were performed on a 2015 MacBook Pro, 2.5 GHz Intel Core i7, 16 GB 1600 MHz DDR3. For all tests the best of three runs was selected for the result.
Welcome to another round of the js web framework benchmark.
Once again a lot has happened:
- Elm, Knockout, Nx, Binding.scala, Dio, Polymer, Simulacra and Svelete are new in this round.
- Most other frameworks have been updated to their latest version (though it took a bit to write this blog post, so they might be outdated again by now…)
- There are now two result tables: One for “keyed” implementations and one for “non-keyed”. I’ve written a separate blog post about that. The classification as keyed or non-keyed is not complete yet and only based on the current benchmark implementation and does not indicate that the framework doesn’t support the other mode. Feel free to send me pull requests!
- The “clear rows a 2nd time” benchmark has been removed. It showed a (maybe theoretical) issue in react that is fixed by now.
I’d really like to thank all contributors (at the time of writing there are 33 of them). Without you it would be impossible to make such a comparison.
The results are here:
Key observations (pun intended):
- It’s still the case that frameworks are getting faster. Riot 3 is a big step in terms of performance in this benchmark.
- Ember is included again in this round and 2.10-beta is a big improvement.
- I’ll really have to rework my vanilla.js implementation again if it should become the fastest possible implementation again 😉
- Apart from that many frameworks are in a performance range that should be fine for many requirements. Vue.js, kivi, vidom, plastiq, domvm and bobril in the keyed table show little weakness.
- As for the non-keyed frameworks the results are much closer. Dio, inferno and svelte are incredibly fast, domvm and Vue.js are only a bit behind.
You can to click through the implementations in your browser here. Please keep in mind that all durations printed on the browser console are just approximations, but good enough to feel the effect between fast and slow implementations. The real results are gathered from the chrome timeline with the test driver in webdriver-ts.
The next round of the js-framework-benchmarks will make a distinction between “keyed” and “non-keyed” implementations. This blog post explains what it means and why this distinction is relevant.
Continue reading “JS web frameworks benchmark – keyed vs. non-keyed”
Here’s another update for the js web framework benchmark. This time the benchmark has seen lots of contributions:
- Dominic Gannaway updated and optimized inferno
- Boris Kaul added the kivi framework
- Chris Reeves contributed the edge version of ractive
- Michel Weststrate updated react-mobX
- Gianluca Guarini updated the riot benchmark
- Gyandeep Singh added mithril 1.0-alpha
- Leon Sorokin contributed domvm
- Boris Letocha added bobril
- Jeremy Danyow rewrote the aurelia benchmark
- Filatov Dmitry updated the vidom benchmark
- Dan Abramov, Mark Struck, Baptiste Augrain and many more…
Thanks to all of you for contributing!
Continue reading “JS web frameworks benchmark – Round 4”
Here’s what’s new:
- Added cycle.js v6 and v7. What a difference the new version makes!
- Added inferno.js. This small framework made writing a faster vanillajs version challenging.
- Updated all frameworks to their current version.
- New benchmarks: Append 10,000 rows, Remove all Rows, Swap two rows.
- Added two benchmarks that measure memory consumption directly after loading the page and when 1,000 rows are added to the table.
The result table for chrome 51 on my MacBook Pro 15 reveals some surprising results (click to enlarge):
(Update: cycle.js v7 was reported with an average slowdown of 2, but I forgot a logging statement. Without it it improved further to a factor of 1,8. The table has been updated.)
- Vanillajs is fastest, but not by much and that only by hard work.
- Inferno.js is an incredibly fast framework and only 1.3 times slower than vanillajs, which is simply amazing.
- plastiq comes in as the second fastest framework being just 1.5 times slower, followed by vidom.
- React.js appears to have a major performance regression for the clear rows benchmarks. (Clearing all rows for the third time is much faster. There are two benchmarks pointing to that regression which is a bit unfair for react.js)
- Cycle.js v6 was slowest by far, v7 changes that completely and leaves the group of the slow frameworks consisting of ember, mithril and ractive.
All source code can be found on my github repository. Contributions are always welcome (ember and aurelia are looking for some loving care, but I to admin that I have no feelings for them. And I’d like to see a version of cycle.js with isolates) .
Thanks a lot to Baptiste Augrain for contributing additional tests and frameworks!
Here’s a follow-up to my last blog post since there’s good reason for an update:
- It turned out that ember performs better with ‘key=”identifier”‘ in the #each helper.
- One of the vue.js creators contacted me and told me that they fixed vue.js in response to the my benchmark
- Two other react-like libraries were added: Preact and react-lite
Here’s the new measurement showing the duration in milliseconds:
(Click the image to see the graph in full size)
And the new conclusions:
- Ember got much faster for “partial update” (from 164 msecs to 58 msecs), but is still quite slow for create or update 1000 rows.
- Vue.js improved a lot for the “update 1000 rows (hot)” benchmark (from 435 msecs to 260 msecs)
- Preact leaves quite a good impression. It’s a lot faster than react for create 1000 rows and update 1000 rows and not much slower for the rest.
- Almost the same can be said about react-lite though the performance for “remove row” is rather weak.
All in all especially angular 2, vidom and preact impress with their performance. Still aurelia feels much faster in the browser than in the selenium tests (except for startup duration which might be my fault not using the bundler correctly). And yes, vue.js is now pretty close to the fastest frameworks.
You can view the benchmarks in the browser and find all source code on github.
One approach to measure the performance would be to use browser tools like the chrome timeline, which reveals exact timings, but has the disadvantage of being a manual and time consuming process and yielding only a single sample.
At first I tried automated benchmarking tools such as Benchpress or protractor-perf, but I didn’t really understand the results and thus decided to roll my own selenium webdriver benchmark. I wrote an additional blog entry to describe this approach. To put it short it measures the duration from the start of a dom click event to the end of the repainting of the dom by parsing the performance log. To reduce sampling artifacts it takes the average of runnig each benchmark 12 times ignoring the two worst results.
Continue reading “JS web frameworks benchmark – Round 1”
What to measure
Before running a benchmark one should be clear about what to measure. In this case I wanted to know which framework is faster for a few test cases. I knew which test cases, which frameworks, which left unclear what faster actutally means. Let’s take a look at a chrome timeline:
Continue reading “Benchmarking JS-Frontend Frameworks”
A single benchmark doesn’t prove anything, so why not add another well known benchmark. I took NBody (as I always did on this blog ;-)).
The results for NBody confirmed those results. For C I took the fastest plain C implementation from the Computer Language Benchmarks Game. Once again the y-axis shows the duration (this time in seconds).
- Slower and faster usually cause headaches in benchmarks (There a nice paper about that http://hal.inria.fr/docs/00/73/92/37/PDF/percentfaster-techreport.pdf). I sticked with the elapsed time, such that e.g. 42% slower means that the factor of the durations was 1.42.
- On the MacBook Pro C was compiled with clang using -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 for x64. Java was Oracle Hotspot 1.8.0-ea-b87 on 64 bit (thus C2 aka Server Hotspot). Chrome was 28.0.1493.0, but the 32 bit version. I tried to compile V8 myself, but both the x86 and x64 custom built V8 were significantly slower than Chrome so I stick with Chrome.
- On the iPhone I used a release configuration using clang with (among others) -O3 -arch armv7
- The Google Nexus 7 runs Android 4.2.2, Chrome 26.0.1410.58. C was compiled with -march=armv6 -marm -mfloat-abi=softfp -mfpu=vfp -O3.
The java virtual machine recently introduced a very interesting optimization that allows to eliminate some object allocations. This optimization is called scalar replacement and depends on escape analysis. You can read more about it in an article by Brian Goetz.
Simply spoken when an object is identified as non-escaping the JVM can replace its allocation on the heap with an allocation of its members on the stack which mitigates the lack of user guided stack allocation. The optimization is enabled by default since JDK 6U23 in the hotspot server compiler.
Continue reading “Scalar replacement: Automatic stack allocation in the java virtual machine”