Java vs. C benchmark #2: JET, harmony and GCJ

In one of the comments regarding my Java vs. C benchmark Dmitry Leskov suggested including Excelsior JET. JET has an ahead of time compiler and is known to greatly reduce startup time for java applications. I’ve kept an eye on JET since version 4 or so and while the startup time has always been excellent the peek performance of the hotspot server compiler was better. With JET 5 performance for e.g. scimark has improved greatly so I decided to rerun the benchmark for JET 5 and JET 6 beta 2. JET 6 beta 2 is currently available on windows only and thus the tests were run under Windows Vista, JET 5 (and all other VMs) ran under Ubuntu. I also benchmarked JET 5 on Windows to check if there’s a large OS-related difference, but the results were within 2.4% (still a t-Test showed a significant difference). As a simplification I decided to publish only the Ubuntu JET 5 results. Nevertheless I’ll update the results when beta 3 becomes available for linux.

Another interesting VM is Apache Harmony. It is designed to be a complete open source JDK and it received a lot of attention when it started (and it became pretty quite nowadays). It started before Sun decided to open their JDK under the GPL, so if nothing else harmony was in my opinion one of the reasons that we have Sun’s openjdk project now. Harmony’s VM is based on a intel donation so that alone makes benchmarking interesting. Of course Harmony is still in the early stages and it would be almost a miracle if Harmony 1.0 could beat the performance of Sun’s JDK.

The third VM is also an ahead of time compiler. GCJ is a java frontend for the GCC and thus might produce code roughly identical to GCC. There isn’t too much publicity for GCJ despite its effort to become a really usable JVM. Combined with the GIJ interpreter and the gcj-dbtool GCJ is able to compile even complex applications like eclipse. GCJ uses classpath as its underlying implementation of the JDK classes which means some parts of the JDK are still missing. I decided to use the Ubuntu gcc 4.3 snapshot as it turned out to work best on my PC.


Update: Please read the Java vs. C benchmark for a description of the benchmarks. Just a quick reminder: The numbers show the peak performance after a warmup run.

I used JET with the global optimizer and chose aggressive inlining. I was told by Excelsior that both settings shouldn’t matter for those benchmarks but I didn’t validate that.

Harmony was run in the server mode by passing “-Xem:server”.

For GCJ I choose either -mfpmath=sse or -mfpmath=387 depending on which setting had performed better for GCC. The other options were “-march=native -msse2 -O3 -funroll-all-loops -fomit-frame-pointer -funsafe-math-optimizations -ffast-math -fstrict-aliasing -fwhole-program”. In all benchmarks except mandelbrot I additionally choose “-fno-store-check -fno-bounds-check”. These settings prevent array store and bounds checking so I’ve even allowed aggressive optimizations that violate the JLS. This optimization might be considered unfair because the other VMs have to perform those checks (Though good optimizers should optimize them away. But it seems as if hotspot fails to perform a complete array bounds checking removal for the Fannkuch benchmark). On the other hand if you know that your code throws neither ArrayIndexOutOfBoundsException nor ArrayStoreException you might be interested in taking the performance advantage.



GCJ is faster than GCC (significantly according to a t-test), but let’s not forget that it’s GCC 4.2 vs. GCJ 4.3 and GCC performance varies quite a bit across versions.
JET 6 is faster than JET 5 but both are slightly behind Sun’s JDK. Harmony is about 26% slower than JDK 6. Here’s the picture (all diagrams display the duration in milliseconds, i.e. smaller is better of course):

mandelbrot benchmark: duration in milliseconds


Both JET versions and GCJ are able to deliver good performance. Running GCJ with store and bounds enabled made it 44% slower. Harmony is far behind being 60% slower than JET 6.

spectralnorm benchmark: duration in milliseconds


Once again JET and GCJ perform very well. JET 6 was even fastest and was 31% faster than JET 5 – but keep in mind that JET 6 was run on windows vista while all others ran on Ubuntu linux.
I also tested GCJ with store and bounds checking enabled. It was around 20% slower than the GCJ version without checks.
Harmony had a real problem with be benchmark. It took almost 28 times longer to finish the benchmark than JET 6. Harmony with the option -Xem:client was a little bit quicker, but more than 260 seconds is still fairly bad.

nbody benchmark: duration in milliseconds


We had seen last time that fannkuch was one of the most interesting benchmarks with widely spread results. GCJ 4.3 was a little quicker than GCC 4.2. Enabling store and bounds checks made GCJ 34% slower. JET 6 shows a clear improvement from JET 5 (17% faster). And this time even harmony does remarkably well (which shows that Sun’s hotspot has serious problems with that benchmark).

fannkuch benchmark: duration in milliseconds


JET 6 beta 2 is an impressive piece of software. Besides reducing startup time it has an impressive performance beating JDK 6 and 7 server in three of four benchmarks (and being very, very close in the fourth). JET 6 beta 2 is a significant improvement over JET 5. In contrast to GCJ JET is fully compatible with JDK 5 and has passed the JCK tests. The only downside is that it’s commercial and comes at a price that renders it for hobby developers unattractive.

GCJ does very well in those benchmarks and appears to be comparable with GCC performance as long as you’re willing to abandon array store checks and bounds checking. With the checks enabled GCJ can’t match the performance of current VMs.

Harmony still has a long way to go. Of course it’s still in the early stages and we’ll see performance increasing (Sun’s hotspot also took ages to become that fast), but as for today I wouldn’t use it. But I promise to rerun the benchmarks with newer builds and I’m looking forward to witnessing large performance improvements