Java’s performance is perceived rather differently depending on who you ask ranging from Java-is-faster-than-C to “java is 10x slower”.
Without actually running some benchmarks it’s hard to tell who is actually right and of course every benchmark will show different results and both sides have good arguments. I don’t know of any real world applications that has been ported from C to Java in such a way that statements about their relative performance are valid. So the only source I know are (micro-)benchmarks. Besides the well known Scimark and linkpack benchmarks there are some interesting benchmarks on the “computer language benchmark game” formerly known as the great language shootout. It has often been criticized for too short duration and including warmup times for JITs. Still I like those benchmarks since they are not classic microbenchmarks, but (almost) every benchmark tries to stress a certain set of language features and returns a well defined output.
To make it short: I decided to select four computational intensive, IO-less benchmark from the shootout.
To avoid the often criticized inclusion of startup and warmup times I decided to add to each benchmark a loop that invokes the original benchmark procedure. Each run of the benchmark gets it’s command line parameter and – if possible – every run uses distinct parameters to prevent cheating. The measuring of the run time is performed by the program itself for each loop and each result is printed separately. It turned out that the first run is slower than the remaining runs. Thus I’ve ignored the first run which includes warmup costs for JITs and averaged the remaining runs. I’ll take a look at how big these costs are in a future article (But be assured that except one case they are pretty small).
So let’s take a look at the benchmarks I’ve chosen: Mandelbrot, Fannkuch, NBody and Spectralnorm. All of them perform no IO or string manipulation, but just numeric computations. All but Fannkuch perform some heavy floating point computations in double precision, while Fannkuch measures integer array manipulation performance. One important advantage of these benchmarks is that they are implemented very similarly and thus give a good feeling for the low level performance.
As for the java fraction the following JVMs were compared: Sun JDK 6U2, Sun JDK 7b20, IBM JDK 5 and Bea JRockit JDK 6 R 27.3.1. Sun’s Hotspot and Bea’s JRockit were measured with the server compiler.
IBMs JDK 6 is available as a preview version, but it’s license forbids publishing benchmarks (is that IBM’s contribution to Open Source JDKs?). Nevertheless I ran the benchmarks on it and know I now what I can expect from it (not too much, btw.)
For the C/C++ programs I’ve used gcc 4.2.1 and Intels ICC 10.0.
GCC allows for different settings for math instruction. The switch -mfpmath=387 uses the 387 FPU whereas the switch -mfpmath=sse uses the SSE(2) instruction set for floating point operations. It’s even possible to combine both but the results didn’t differ from using just one option (which I did for clarity). Thus gcc appears twice in the results, once labeled as “gcc (387)” and the other as “gcc (sse)”
Most of the time the additional flags “-march=native -msse2 -O3 -funroll-loops -fomit-frame-pointer -fstrict-aliasing -fwhole-program” were used. For icc I found the flags “-xT -fast” to create the best code.
All benchmarks were run on my laptop using 2 GB RAM and a Intel Core 2 Duo running at 2 GHz on Ubuntu 7.04 32 Bit.
The original mandelbrot benchmark writes a PBM image to stdout and requires that all output has to happen byte by byte. I’ve changed the rules such that each program must now count how often the sequence stays below the limit and print that number. The input size varied for each loop increasing from 4000 to 4005 pixels.
The results show an advantage for C in favor of Java. icc performed 6% better than gcc (387), 16% faster than JRockit and 22% faster than hotspot. IBM came out last 56% behind icc.
Spectalnorm was run for argument size 5540, 5541, 5542, 5543 and 5550 and was (except for the loop) not further modified.
The results were quite impressive, but also pretty boring. All except gcc 4.2 with 387 instructions were within 5% and JRockit was even able to beat ICC (just by 1% but still statistically significant). Spectralnorm appears to be no challenge for a good optimizing compiler (just for fun I once took the hotspot client compiler. It ran more than twice as long)
NBody was run with arguments 19900000, 19800000, 19990000, 19890000 and 20000000.
This time the results were a bit more widespread than with spectralnorm. JDK 7 finished first with icc only 1.4% behind. This benchmark seems to be very painful for the IBM JDK. It was less than half as fast as JDK 7.
The fannkuch benchmark turned out to be the one with the largest spread. Since the runtime increases very quickly for the argument given I used 11 for each loop.
ICC does a tremendous job and leads more than 29% over JRockit. Both GCC programs finish third (Since there are no floating point operations I also expected them to perform equally). Sun’s hotspot compiler seems to have severe problems with this benchmark. JDK 7 is 2.1 and JDK 6 even 2.44 times slower than C. Even the JDK 6 client compiler is 12% faster than the JDK 7 server compiler. I decided to file a bug report on this to see whether sun can improve performance for this benchmark.
The performance of JRockit was very surprising. The benchmark uses a lot of indirect array access which make removal of array bound checking very hard and thus should give an advantage to the C programs. Still it was able to beat GCC by almost 6%.
I’ll look at this benchmark again in future and see if there’s anything that can be done for java.
It’s hard to draw a conclusion because the results don’t speak just one language.
But a few things can be said without regretting:
- ICC is faster than GCC in every benchmark. This is not really new information and will surprise almost no one I guess.
- Judging from the worst case performance ICC is also the “safest” choice. In its slowest benchmark it was 1.6% behind the fastest contender.
- Things are not as clear in the java world, but let’s try a ranking: Bea’s weakest point is the NBody benchmark where it’s 22% percent slower than the fastest JDK. Overall fannkuch was the weakest benchmark with a lag of 30% in comparison to ICC. In all other benchmarks it performed better than SUN’s or IBM’s JDKs. The lead in fannkuch to the other JVMs is impressive, so I’d call JRockit the fastest JVM for those benchmarks.
- The early JDK 7 build 20 is a step in the right direction. It’s performance is better than JDK 6 in every(!) benchmark, even by 26% for NBody and 14% for fannkuch.
- Saying that C is generally several times faster than java is – according to those benchmarks – simply wrong. If you’re allowed to choose the fastest JVM the worst case for java was 30%. In other benchmarks Sun and JRockit were even able to beat ICC. Not by much but I guess it’s nevertheless just very remarkable that it’s possible to beat ICC. Another interesting figure is that Bea was less than 14% slower than GCC in the worst case (nbody) and was in two cases faster than GCC.
- Saying that Java is faster than C can also be pretty wrong, especially if you have to stick with one JVM. The worst case in these benchmarks were 30% slower for JRockit to 2.44 times slower for Sun’s JDK 6U2.
As every benchmark before I’m sure this one will cause some interesting questions and discussions. I’m interested if someone finds some flags here or there that will put some contender in a different light.
Moreover there are some interesting compilers that I’d like to benchmark. Among those are llvm2.1, different gcc versions (4.1 , 4.2 and 4.3 snapshots) also with profile guided optimization and .NET (MS and mono) – maybe also with some other benchmarks. I’ll come back with some future blog entries on those points.
The benchmark has been extended to include Excelisor JET, GCJ and Apache Harmony. You can read about it in Java vs. C benchmark #2: JET, harmony and GCJ.
Download the java and C source code