Java vs. C benchmark #2: JET, harmony and GCJ

In one of the comments regarding my Java vs. C benchmark Dmitry Leskov suggested including Excelsior JET. JET has an ahead of time compiler and is known to greatly reduce startup time for java applications. I’ve kept an eye on JET since version 4 or so and while the startup time has always been excellent the peek performance of the hotspot server compiler was better. With JET 5 performance for e.g. scimark has improved greatly so I decided to rerun the benchmark for JET 5 and JET 6 beta 2. JET 6 beta 2 is currently available on windows only and thus the tests were run under Windows Vista, JET 5 (and all other VMs) ran under Ubuntu. I also benchmarked JET 5 on Windows to check if there’s a large OS-related difference, but the results were within 2.4% (still a t-Test showed a significant difference). As a simplification I decided to publish only the Ubuntu JET 5 results. Nevertheless I’ll update the results when beta 3 becomes available for linux.

Another interesting VM is Apache Harmony. It is designed to be a complete open source JDK and it received a lot of attention when it started (and it became pretty quite nowadays). It started before Sun decided to open their JDK under the GPL, so if nothing else harmony was in my opinion one of the reasons that we have Sun’s openjdk project now. Harmony’s VM is based on a intel donation so that alone makes benchmarking interesting. Of course Harmony is still in the early stages and it would be almost a miracle if Harmony 1.0 could beat the performance of Sun’s JDK.

The third VM is also an ahead of time compiler. GCJ is a java frontend for the GCC and thus might produce code roughly identical to GCC. There isn’t too much publicity for GCJ despite its effort to become a really usable JVM. Combined with the GIJ interpreter and the gcj-dbtool GCJ is able to compile even complex applications like eclipse. GCJ uses classpath as its underlying implementation of the JDK classes which means some parts of the JDK are still missing. I decided to use the Ubuntu gcc 4.3 snapshot as it turned out to work best on my PC.

Settings

Update: Please read the Java vs. C benchmark for a description of the benchmarks. Just a quick reminder: The numbers show the peak performance after a warmup run.

I used JET with the global optimizer and chose aggressive inlining. I was told by Excelsior that both settings shouldn’t matter for those benchmarks but I didn’t validate that.

Harmony was run in the server mode by passing “-Xem:server”.

For GCJ I choose either -mfpmath=sse or -mfpmath=387 depending on which setting had performed better for GCC. The other options were “-march=native -msse2 -O3 -funroll-all-loops -fomit-frame-pointer -funsafe-math-optimizations -ffast-math -fstrict-aliasing -fwhole-program”. In all benchmarks except mandelbrot I additionally choose “-fno-store-check -fno-bounds-check”. These settings prevent array store and bounds checking so I’ve even allowed aggressive optimizations that violate the JLS. This optimization might be considered unfair because the other VMs have to perform those checks (Though good optimizers should optimize them away. But it seems as if hotspot fails to perform a complete array bounds checking removal for the Fannkuch benchmark). On the other hand if you know that your code throws neither ArrayIndexOutOfBoundsException nor ArrayStoreException you might be interested in taking the performance advantage.

Results

Mandelbrot

GCJ is faster than GCC (significantly according to a t-test), but let’s not forget that it’s GCC 4.2 vs. GCJ 4.3 and GCC performance varies quite a bit across versions.
JET 6 is faster than JET 5 but both are slightly behind Sun’s JDK. Harmony is about 26% slower than JDK 6. Here’s the picture (all diagrams display the duration in milliseconds, i.e. smaller is better of course):

mandelbrot benchmark: duration in milliseconds

Spectralnorm

Both JET versions and GCJ are able to deliver good performance. Running GCJ with store and bounds enabled made it 44% slower. Harmony is far behind being 60% slower than JET 6.

spectralnorm benchmark: duration in milliseconds

NBody

Once again JET and GCJ perform very well. JET 6 was even fastest and was 31% faster than JET 5 – but keep in mind that JET 6 was run on windows vista while all others ran on Ubuntu linux.
I also tested GCJ with store and bounds checking enabled. It was around 20% slower than the GCJ version without checks.
Harmony had a real problem with be benchmark. It took almost 28 times longer to finish the benchmark than JET 6. Harmony with the option -Xem:client was a little bit quicker, but more than 260 seconds is still fairly bad.

nbody benchmark: duration in milliseconds

Fannkuch

We had seen last time that fannkuch was one of the most interesting benchmarks with widely spread results. GCJ 4.3 was a little quicker than GCC 4.2. Enabling store and bounds checks made GCJ 34% slower. JET 6 shows a clear improvement from JET 5 (17% faster). And this time even harmony does remarkably well (which shows that Sun’s hotspot has serious problems with that benchmark).

fannkuch benchmark: duration in milliseconds

Conclusion

JET 6 beta 2 is an impressive piece of software. Besides reducing startup time it has an impressive performance beating JDK 6 and 7 server in three of four benchmarks (and being very, very close in the fourth). JET 6 beta 2 is a significant improvement over JET 5. In contrast to GCJ JET is fully compatible with JDK 5 and has passed the JCK tests. The only downside is that it’s commercial and comes at a price that renders it for hobby developers unattractive.

GCJ does very well in those benchmarks and appears to be comparable with GCC performance as long as you’re willing to abandon array store checks and bounds checking. With the checks enabled GCJ can’t match the performance of current VMs.

Harmony still has a long way to go. Of course it’s still in the early stages and we’ll see performance increasing (Sun’s hotspot also took ages to become that fast), but as for today I wouldn’t use it. But I promise to rerun the benchmarks with newer builds and I’m looking forward to witnessing large performance improvements

11 thoughts on “Java vs. C benchmark #2: JET, harmony and GCJ

  1. “a t-Test showed a significant difference”

    As you’re doing t tests, what’s the sample size now?

  2. Hi Issac, the sample size hasn’t changed (i.e. the sample size is only 4), I understand and accept your criticism that these benchmarks aren’t perfectly scientific (with one reason that the sample size is too small).
    However I’m not going to change the sample size simply because I don’t have enough time for it and that’s not the purpose of those benchmarks (you could do that on the language shootout, but since JRockit has a much larger warmup phase than Hotspot the tests should be adapted to run multiple times in one process like here or at least take much longer).
    These benchmarks here should show the peek performance of JVMs compared to each other and C++. I think the results presented here are exact enough for deciding whether the factor 2-3 is true or not (And depending on the benchmark it is or is not…). Harmony wouldn’t beat C++ if sample size went up and the hotspot compilers wouldn’t perform any better for fannkuch either.

    Still to justify publishing a ranking even when results were very close like for nbody: I agree that small differences could affect the results, but the samples had very small variation.Tests that showed a large variation were rerun to eliminate influence by other OS processes as much as I could. You can take a look at a “scatterplot” (or at least what can be done easily with openoffice) for NBody on http://www.stefankrause.net/wp/imgs/scatterplot_nbody.jpeg (The first sample is the warmup run that has been left out. You can see that JRockit indeed has a much slower warmup phase)

  3. “aren’t perfectly scientific”

    My guess is that with so few samples the t-test is pretty meaningless.

    I’m won’t quarrel about whether you have enough time or not, obviously it’s different from my perspective. You have programs running 10 20 50 seconds and the benchmarks game has programs running 9,000 13,000 seconds – so it just seemed possible for you to do more samples.

  4. Hi Stefan,

    If people start reading the “Java vs. C performance” series from the November issue, they may be unaware of that you “warm-up” JVMs with dynamic compilers before measuring the time so the peak performance is actually reported.

    I think it would be useful to clearly mention it each time and give a link to the description of your testing methodology.

    —————

    ['bout GCJ:]
    “These settings prevent array store and bounds checking so I’ve even allowed aggressive optimizations that violate the JLS. This optimization might be considered unfair because the other VMs have to perform those checks”

    It’s ok for benchmarking. I beleive JVMs should do a better job to remove index checks and show good comparative results even if checks are turned off for GCJ or no checks are generated by C compilers.

    Thanks,
    –Vitaly

  5. Wow – it’s the first time I heard of “-p” releases. The link and download page don’t have much information about them. For linux there’s just a x64 version, is that right? What are the differences to the normal releases? Is 6u5-p already available as a download?

  6. I am afraid you are right. There is only x64 versions. Currently only 6u4-p download is available and said to be %10 behind of 6u5-p. I hadn’t check it before :(
    I do not have too much information between performance releases and normal releases. From the blog I expect their JIT compiler produce more effective code.

  7. Stefan, I just wrote a Java benchmarking framework that may be of interest to you to use in benchmarking Java code. In fact, this framework will be discussed as part of an article that I am writing for IBM developerWorks on measuring Java performance. If interested, send me an email, and I can send you an article preprint.

  8. Pingback: Stefan Krause.blog() » Blog Archive » Update For Java Benchmark