Java vs. C benchmark

Java’s performance is perceived rather differently depending on who you ask ranging from Java-is-faster-than-C to “java is 10x slower”.
Without actually running some benchmarks it’s hard to tell who is actually right and of course every benchmark will show different results and both sides have good arguments. I don’t know of any real world applications that has been ported from C to Java in such a way that statements about their relative performance are valid. So the only source I know are (micro-)benchmarks. Besides the well known Scimark and linkpack benchmarks there are some interesting benchmarks on the “computer language benchmark game” formerly known as the great language shootout. It has often been criticized for too short duration and including warmup times for JITs. Still I like those benchmarks since they are not classic microbenchmarks, but (almost) every benchmark tries to stress a certain set of language features and returns a well defined output.
To make it short: I decided to select four computational intensive, IO-less benchmark from the shootout.
To avoid the often criticized inclusion of startup and warmup times I decided to add to each benchmark a loop that invokes the original benchmark procedure. Each run of the benchmark gets it’s command line parameter and – if possible – every run uses distinct parameters to prevent cheating. The measuring of the run time is performed by the program itself for each loop and each result is printed separately. It turned out that the first run is slower than the remaining runs. Thus I’ve ignored the first run which includes warmup costs for JITs and averaged the remaining runs. I’ll take a look at how big these costs are in a future article (But be assured that except one case they are pretty small).

So let’s take a look at the benchmarks I’ve chosen: Mandelbrot, Fannkuch, NBody and Spectralnorm. All of them perform no IO or string manipulation, but just numeric computations. All but Fannkuch perform some heavy floating point computations in double precision, while Fannkuch measures integer array manipulation performance. One important advantage of these benchmarks is that they are implemented very similarly and thus give a good feeling for the low level performance.

The Contenders

As for the java fraction the following JVMs were compared: Sun JDK 6U2, Sun JDK 7b20, IBM JDK 5 and Bea JRockit JDK 6 R 27.3.1. Sun’s Hotspot and Bea’s JRockit were measured with the server compiler.
IBMs JDK 6 is available as a preview version, but it’s license forbids publishing benchmarks (is that IBM’s contribution to Open Source JDKs?). Nevertheless I ran the benchmarks on it and know I now what I can expect from it (not too much, btw.)
For the C/C++ programs I’ve used gcc 4.2.1 and Intels ICC 10.0.
GCC allows for different settings for math instruction. The switch -mfpmath=387 uses the 387 FPU whereas the switch -mfpmath=sse uses the SSE(2) instruction set for floating point operations. It’s even possible to combine both but the results didn’t differ from using just one option (which I did for clarity). Thus gcc appears twice in the results, once labeled as “gcc (387)” and the other as “gcc (sse)”
Most of the time the additional flags “-march=native -msse2 -O3 -funroll-loops -fomit-frame-pointer -fstrict-aliasing -fwhole-program” were used. For icc I found the flags “-xT -fast” to create the best code.
All benchmarks were run on my laptop using 2 GB RAM and a Intel Core 2 Duo running at 2 GHz on Ubuntu 7.04 32 Bit.

Mandelbrot

The original mandelbrot benchmark writes a PBM image to stdout and requires that all output has to happen byte by byte. I’ve changed the rules such that each program must now count how often the sequence stays below the limit and print that number. The input size varied for each loop increasing from 4000 to 4005 pixels.

results of mandelbrot benchmark

The results show an advantage for C in favor of Java. icc performed 6% better than gcc (387), 16% faster than JRockit and 22% faster than hotspot. IBM came out last 56% behind icc.

Spectalnorm

Spectalnorm was run for argument size 5540, 5541, 5542, 5543 and 5550 and was (except for the loop) not further modified.

results of spectralnorm benchmark

The results were quite impressive, but also pretty boring. All except gcc 4.2 with 387 instructions were within 5% and JRockit was even able to beat ICC (just by 1% but still statistically significant). Spectralnorm appears to be no challenge for a good optimizing compiler (just for fun I once took the hotspot client compiler. It ran more than twice as long)

NBody

NBody was run with arguments 19900000, 19800000, 19990000, 19890000 and 20000000.

results of nbody benchmark

This time the results were a bit more widespread than with spectralnorm. JDK 7 finished first with icc only 1.4% behind. This benchmark seems to be very painful for the IBM JDK. It was less than half as fast as JDK 7.

Fannkuch

The fannkuch benchmark turned out to be the one with the largest spread. Since the runtime increases very quickly for the argument given I used 11 for each loop.

results of fannkuch benchmark

ICC does a tremendous job and leads more than 29% over JRockit. Both GCC programs finish third (Since there are no floating point operations I also expected them to perform equally). Sun’s hotspot compiler seems to have severe problems with this benchmark. JDK 7 is 2.1 and JDK 6 even 2.44 times slower than C. Even the JDK 6 client compiler is 12% faster than the JDK 7 server compiler. I decided to file a bug report on this to see whether sun can improve performance for this benchmark.
The performance of JRockit was very surprising. The benchmark uses a lot of indirect array access which make removal of array bound checking very hard and thus should give an advantage to the C programs. Still it was able to beat GCC by almost 6%.
I’ll look at this benchmark again in future and see if there’s anything that can be done for java.

Conclusion

It’s hard to draw a conclusion because the results don’t speak just one language.
But a few things can be said without regretting:

  • ICC is faster than GCC in every benchmark. This is not really new information and will surprise almost no one I guess.
  • Judging from the worst case performance ICC is also the “safest” choice. In its slowest benchmark it was 1.6% behind the fastest contender.
  • Things are not as clear in the java world, but let’s try a ranking: Bea’s weakest point is the NBody benchmark where it’s 22% percent slower than the fastest JDK. Overall fannkuch was the weakest benchmark with a lag of 30% in comparison to ICC. In all other benchmarks it performed better than SUN’s or IBM’s JDKs. The lead in fannkuch to the other JVMs is impressive, so I’d call JRockit the fastest JVM for those benchmarks.
  • The early JDK 7 build 20 is a step in the right direction. It’s performance is better than JDK 6 in every(!) benchmark, even by 26% for NBody and 14% for fannkuch.
  • Saying that C is generally several times faster than java is – according to those benchmarks – simply wrong. If you’re allowed to choose the fastest JVM the worst case for java was 30%. In other benchmarks Sun and JRockit were even able to beat ICC. Not by much but I guess it’s nevertheless just very remarkable that it’s possible to beat ICC. Another interesting figure is that Bea was less than 14% slower than GCC in the worst case (nbody) and was in two cases faster than GCC.
  • Saying that Java is faster than C can also be pretty wrong, especially if you have to stick with one JVM. The worst case in these benchmarks were 30% slower for JRockit to 2.44 times slower for Sun’s JDK 6U2.

Outlook

As every benchmark before I’m sure this one will cause some interesting questions and discussions. I’m interested if someone finds some flags here or there that will put some contender in a different light.
Moreover there are some interesting compilers that I’d like to benchmark. Among those are llvm2.1, different gcc versions (4.1 , 4.2 and 4.3 snapshots) also with profile guided optimization and .NET (MS and mono) – maybe also with some other benchmarks. I’ll come back with some future blog entries on those points.

Update

The benchmark has been extended to include Excelisor JET, GCJ and Apache Harmony. You can read about it in Java vs. C benchmark #2: JET, harmony and GCJ.

Download the java and C source code

30 thoughts on “Java vs. C benchmark

  1. My computer and embedded devices do not have unlimited memory. Java VMs use more memory in most if not all these tests. In fact, I would not be surprised to find that the JVMs perform closer to C on tests when memory usage is lower. If you go to http://shootout.alioth.debian.org you can score all of these tests and more. By default, CPU usage is rated by a factor of 1 and memory 0. Rate both as one or at least CPU 4 and memory 1. Memory is at least that important a concern. You will see Java fall in the rankings.

  2. Hi Daniel,
    the answer is a bit hidden in the text. Yes all benchmarks were run with the server compiler for JRockit and Sun. I also measured the client VM, but except in fannkuch it was much, much slower (2x slower in spectralnorm).
    Yours,
    Stefan

  3. “I don’t know of any real world applications that has been ported from C to Java in such a way that statements about their relative performance are valid.”

    I ran across this, some years ago, and is imho a valid and well done comparison of not only difference of performance in java and C, but also a good synthesis of porting problematics and solutions.
    http://research.sun.com/techrep/2002/abstract-114.html

    My own experience in porting image filtering application showed that there are great gains that can be obtained, gains that cannot be reached in C, if you take the opportunity to adapt algorithms and inner working to jvm specifics ( dynamic inlining, for example ).

  4. No matter how fast Java is, Java desktop applications still have an unacceptable startup delay. TextMate, for example, starts up much faster than JEdit. Visual Studio starts up faster than Java IDEs. It would be great if we could port a popular and non-trivial Java desktop app to Qt, Gtk, .NET, Cocoa, etc. and compare their startup speeds, memory consumptions, etc. so we could have a more realistic comparison.

    For server side deployments, this is not that big of a deal. But when it comes to desktop applications, it is very important.

  5. Hi, Stefan. StumbledUpon this. I have to agree with Behrang; in real-world scenarios, more often than not, a C/++ application will outperform Java. I’m not really a Java expert, but I think this is because of all the extra baggage that comes with the latter (JVM and the libraries).

    Number-crunching tests may prove me wrong, but from my experience with Java apps, they’re simple slower and less responsive.

    Case: Azureus and Limewire, both on my UbuntuStudio boot and on my WinXP boot, perform very sluggishly on my dual-core Athlon (with 2G of RAM). For me, it’s a simple question of where Occam’s razor should apply – during development, or runtime?

  6. Stefan,

    “I’m interested if someone finds some flags here or there that
    will put some contender in a different light.”

    Try to enable PGO (profile guided optimizations) for the C compilers. They are very important for many optimizations such as regalloc, code placement, etc. It may sqeeze a few percent of performance. Note also that the cited JVMs perform profile guided optimizations “by default”.

    ——

    “It turned out that the first run is slower than the remaining runs. Thus I’ve ignored the first run which includes warmup costs for JITs and averaged the remaining runs. I’ll take a look at how big these costs are in a future article (But be assured that except one case they are pretty small).”

    I ain’t sure what’s the case you mean but a known counterexample for the warm-up technique is flat application profile (no hot methods/loops, lots of “warm” methods). For instance, the Swing implementation has flat profile and that’s one of the reasons why Java GUI may perform sluggishly. I used to write about that, e.g. in this article.

  7. Always nice to see someone having fun with The Computer Language Benchmarks Game :-)

    A couple of questions and some suggestions:
    - you write “every run uses distinct parameters to prevent cheating” but I don’t understand what “cheating” you are trying to prevent?
    - what’s the x-axis on the charts? seconds? milliseconds? …
    - yes mandelbrot writes to stdout but stdout is redirected to /dev/null – did not having mandelbrot write the pbm make a noticeable difference in the relative performance of the programs?

    We got so bored with the “warmup costs for JITs” excuse that we came up with this comparison

    http://shootout.alioth.debian.org/gp4/miscfile.php?file=dynamic&title=Java%20Dynamic%20Compilation

    Because you are only comparing “fast” programming language implementations that are very close, and you are only measuring 4 benchmarks, let me suggest that you do enough measurements for some statistics – instead of just 4, if you’re looking for small effects maybe 400 in each sample, if you’re only looking for bigger effects then maybe 65 in each sample.

    Then instead of showing those bar charts, show box plots
    http://www.lcgceurope.com/lcgceurope/data/articlestandard/lcgceurope/132005/152912/article.pdf

    and maybe do one-way anova to look for significant differences.

  8. frederic barachant wrote “I ran across this, some years ago, and is imho a valid and well done comparison of not only difference of performance in java and C …”

    Unfortunately they didn’t port the improvements they made in the Java version back to the C version, which undermines how valid it is as a Java C performance comparison.

  9. Hi Isaac,

    I’m sorry that your comments didn’t show up. WordPress has some kind of analyzer that rates your comments and apparently some words confused wordpress. The comments were queued and I could approve them just recently.

    First of all thanks for the computer language shootout – I love it!

    > – you write “every run uses distinct parameters to prevent cheating” but I don’t understand what “cheating” you are trying to prevent?
    The JIT might (or might not – who knows) optimize the code for exactly the current arguments given. A clever compiler might e.g. run the program with array bounds checking enabled for the first run and then eliminate the checks for further runs if the all parameters are identical and the first run was okay. If the input argument are variable the JIT simply can simply not optimize that much for special cases. Another thing is that is simply pointless to create exactly the same output more than once…

    > what’s the x-axis on the charts? seconds? milliseconds?
    Yes, that’s very bad style. The information is completely missing: It’s the duration in milliseconds.

    > did not having mandelbrot write the pbm make a noticeable difference in the relative performance of the programs?
    Yes, .NET on windows had a severe performance handicap when redirecting stdout to a file (Results will appear soon on this blog).

    > let me suggest that you do enough measurements for some statistics
    I think the important point is not whether JRockit or ICC is faster as long as they are within 1 or 2%. The IMO interesting cases are when compilers are slower by a factor of 1.5 and more.

    > You talk about looking into “warmup costs for JIT” in a future article – the benchmarks game FAQ shows this comparison
    I know that page and agree that warm up time most of the time doesn’t matter for such small benchmarks. But there are two factors to consider. JRockit does indeed have a much slower first run for some benchmarks, for hotspot the effect is much weaker. And java people simply insist that benchmarks including warmup time are biased…

  10. > “Another thing is that is simply pointless to create exactly the same output more than once…”
    Which is why you don’t need to bother about a clever compiler or JIT optimizing that case – it would be a pointless optimization!

    If you’re going to play with different argument values I suggest you do it like this:
    http://blog.marketcetera.com/2007/03/08/java-and-the-computer-language-shootout/

    I’m a bit confused about your .NET on windows redirecting stdout comment – I thought this was Ubuntu?

    > “I think the important point is not whether JRockit or ICC is faster as long as they are within 1 or 2%.”

    I agree but I think you’re missing the point – you don’t know that they are within 1 or 2% because you aren’t doing enough measurements. You are measuring “elapsed time” on an OS that has many processes running – you aren’t just measuring the time taken by those programs.

    Look at how much variation there is in the “elapsed time” measurements for Java nbody
    http://shootout.alioth.debian.org/gp4/miscfile.php?file=dynamic&title=Java%20Dynamic%20Compilation#nbody

    Box plots will show that variation and let people see how most of the measurements for one program compare to another program – and they’ll show the “slower first run” as an uncharacteristic outlier.

    The only reason we don’t take this approach on the benchmarks game is we’re dealing with languages that take hours to complete, and we’re dealing with many languages – you can be much smarter and do it right.

  11. Try -ffast-math in gcc; there should be significant speedup in some benchmarks. Especially on the core2, fpmath=sse shouldn’t be slower than 387…

  12. JIT warmup time can be an important concern. Many of the programs I use most often exit almost immediately (ls, cat, echo) and waiting on a VM to start up is a pain. Ultimately the question “which language is faster” is meaningless

  13. -ffast-math does not improve the results. I’ve rerun mandelbrot with GCC 4.3. Common options were -march=native -msse2 -O3 -funroll-loops -fomit-frame-pointer -funroll-loops -fomit-frame-pointer -fstrict-aliasing -fwhole-program.
    The average duration was:
    2711 msec -mfpmath=387
    2914 msec -mfpmath=387 -ffast-math
    3404 msec -mfpmath=sse
    3427 msec -mfpmath=sse -ffast-math

    For 387 the degradation with ffast-math is quite heavy, for sse it’s less pronounced but still every sample was slower than without ffast-math

  14. Strange. I’ve tried several java benchmarks (scimark2, ackerman, my own micro benchmarks for LinkedBlockingQueue and ConcurrentLinkedQueue) and on my Linux machines (each has 2x xeon 5160 3Ghz) 32 bit version of Sun’s JVM 1.6_03 seems to be definitely the fastest one. It definitely outperforms the BEA’s JRockit (R27.3.1) in 99% of tests.
    However, I’m heavily optimizing with tons of JVM arguments, both for Sun’s JVM and Bea’s JRockit, so this may be the reason.

    With 64 bit version of Sun’s JVM the results are different: JRockit wins in many cases then. Luckily I don’t need more memory then -Xmx2400m (I’m using -XX:+UseLargePages too), so 32 bit version is ok with me.

    Can you please try to rerun the tests in which the JRockit outperforms the Sun’s JVM on your machine with the following JVM arguments (for Sun’s JDK – you will need to adjust the memory args to your machine’s memory though)? – I’m curious if the JRockit will still be the best..

    -server -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -Xms1024m -Xmx1900m -XX:NewRatio=2 -XX:NewSize=448m -XX:+UseLargePages -XX:CompileThreshold=1500 -XX:+UseThreadPriorities -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:ParallelGCThreads=4 -XX:MaxTenuringThreshold=2 -XX:SurvivorRatio=32 -XX:TargetSurvivorRatio=70 -XX:+UseTLAB -XX:MinTLABSize=4k -XX:TLABSize=256k -XX:+ResizeTLAB -XX:+UseBiasedLocking -XX:+UseSpinning -XX:PreBlockSpin=80 -XX:MaxInlineSize=200 -XX:FreqInlineSize=200

    And just for reference, the JRockit’s arguments I’m using:

    -jrockit -XlargePages -Xms:1024m -Xmx:1900m -Xns:512m -XXaggressive:memory -XXcompressedRefs=true -Xgcprio:deterministic -XpauseTarget=30ms -XX:+UseNewHashFunction -XX:+UseThreadPriorities -XXallocPrefetch -XXcallProfiling -XXtlaSize:min=8k,preferred=512k -XXlazyUnlocking

    (And yes, I know I’m not using the garbage collector with the highest possible throughput, but my applications require deterministic, as low as possible, pause times so this is a priority for me..)

    Thanks!

  15. The thing that confuses me is that the D version (http://paste.dprogramming.com/dpdvgkvd ), although it does come close (3.5s vs 3.56s) to the performance of the C++ version, produces slightly different numbers for arg=4000 (count=6351040 for the C++ version VS count=6362936 for the D version). I think it’s because of slight implementation differences between D’s cdouble data type VS C++’s double re, im, but I have no idea what they could be. :shrug:

    Anyway, here’s my results.

    gentoo-pc ~/d/benchmarks $ g++ mandelbrot_long.cpp -O3 -ffast-math -funroll-loops -o m_cp && sudo nice -n -20 ./m_cp 4000; gdc mandelbrot_long.d -O3 -ffast-math -funroll-loops -frelease -o m_d && sudo nice -n -20 ./m_d 4000
    duration 3495.832000 count=6351040
    duration 3547.14 count 6362936

    In fannkuch, gdc comes out a teensy bit ahead .. (D source: http://paste.dprogramming.com/dpvb23tt)

    gentoo-pc ~/d/benchmarks $ gdc fannkuch_long.d -o f_d -O3 -frelease -ffast-math && sudo nice -n -20 ./f_d 11; g++ fannkuch_long.cpp -o f_cp -O3 -ffast-math && sudo nice -n -20 ./f_cp 11
    Pfannkuchen(11) = 51
    duration 7773.07
    Pfannkuchen(11) = 51
    duration 7792.536000

    I was too lazy to translate the rest. :)

    –feep

  16. I can imagine easily than Xeons show a different behaviour than a core 2.
    I’ve checked it multiple times – your settings do not improve the results (I had to remove UseLargePages it printed a warning). There was no run that performed better than a plain java -server. But since all benchmarks create no new objects in their loops all GC and memory related settings shouldn’t matter. As there’s also no locking escape analysis and the locking flags don’t help. How did you find exactly that set of arguments?

    The results were:
    3342 msec for java -server -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -Xms128M -Xmx128M -XX:NewRatio=2 -XX:NewSize=64m -XX:CompileThreshold=1500 -XX:+UseThreadPriorities -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:ParallelGCThreads=4 -XX:MaxTenuringThreshold=2 -XX:SurvivorRatio=32 -XX:TargetSurvivorRatio=70 -XX:+UseTLAB -XX:MinTLABSize=4k -XX:TLABSize=256k -XX:+ResizeTLAB -XX:+UseBiasedLocking -XX:+UseSpinning -XX:PreBlockSpin=80 -XX:MaxInlineSize=200 -XX:FreqInlineSize=200 mandelbrot_long 4000 4001 4002 4003 4004 4005

    3291 msec for java -server mandelbrot_long 4000 4001 4002 4003 4004 4005.

    I also had to change the JRockit setting such that they work on my notebook (without a commercial JRockit license):
    /home/stef/progs/jrockit-R27.3.1-jdk1.6.0_01/bin/java -server -Xms:128m -Xmx:128m -Xns:32m -XXaggressive:memory -Xgcprio:pausetime -XX:+UseNewHashFunction -XXallocPrefetch -XXcallProfiling -XXtlaSize:min=8k,preferred=512k -XXlazyUnlocking mandelbrot_long 4000 4001 4002 4003 4004 4005

    The results were indistinguishable for your settings and just using -server and were about 3200 msec.

  17. “ICC is faster than GCC in every benchmark. This is not really new information and will surprise almost no one I guess.”

    Oh? I am surprised(though maybe not much since you are using a Core 2 Duo…). Here are my numbers for one of the benchmarks, with gcc 4.3 snapshot, and icc 10.0.023 on a P4.

    fannkuch 11 runs:

    icc -xN -O3 -ipo -static: 5.9s
    gcc -O3 -march=pentium4 -funroll-loops -fomit-frame-pointer: 5.5s
    gcc as above with profile-generate/profile-use: 5.0s

  18. One problem is that your not creating any garbage. All Real applications produce garbage. Java is slower than C when you have lots of objects.

  19. Your likely not reading comments this late in the game, but I wanted to ask for another implementation. gcc-2.95. Why would you benchmark this? Well I’ve read in multiple places that gcc has gotten progressively worse in performance with version increase.

    For example a comp.lang.forth post giving this some credibility.
    http://tinyurl.com/yvnh4t

    Would be interesting if gcc-2.9.5 rocked ICC on any of the tests.

  20. What i would like to see is a program compiled using 2003 compilers and run today.

    I would expect C to be a bit quicker that java inherently.
    But the advantage of bytecode on a vm is as the vm improves so does the byte-coded
    application.
    That is not always the case for compiled applications.

  21. “One problem is that your not creating any garbage. All Real applications produce garbage. Java is slower than C when you have lots of objects.”

    Not true!

    First, C does not have actual objects per se–things like strucs are much lighter weight, at the cost of being less useful than objects–so lets consider C++.

    For apps that generate lots of objects, java is actually faster than C++ because java’s automatic memory management is just so much better than what most people in most real applications can do by hand nowadays.

    The main complaint from people who seriously look at languages is that java usually uses more memory than the equivalent C++ app (and both use more than if you wrote it in C). Depending on the context (e.g. supercomputer versus tiny embedded device), this memory difference may/may not matter.

    But memory intensive apps are a performance win for java.

    Where C/C++ can still win over java is in certain numerical apps where hand tuned optimizations for certain cpus can make a difference.

    The biggie is even jdk 6 still does not make great use of vector type of instructions; here are some links:
    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6340864
    http://forum.java.sun.com/thread.jspa?threadID=5148725&messageID=9564641
    http://forums.java.net/jive/thread.jspa?threadID=18675
    This will undoubtably be rectified in the future, but for now it sometimes can be an issue.

    But C/C++’s days are numbered; the sweet spot of apps for where you really need C/C++ has and continues to shrink a lot to java’s favor. And java is so much more productive to program in, crossplatform, maintainable, etc.

  22. C/C++ is compiled, and optimised against the target CPU (if done properly)
    Java is turned into bytecode and executed on a virtual machine.
    There is no way that a java ‘program’ can be as fast unless the C/C++ coder has done something very stupid!
    I see no advantage to using java except maybe through pure laziness. With gcc, and sensible coding, it is possible to produce cross platform code relatively easily in C/C++.

  23. “C/C++ is compiled, and optimised against the target CPU (if done properly)
    Java is turned into bytecode and executed on a virtual machine.
    There is no way that a java ‘program’ can be as fast unless the C/C++ coder has done something very stupid!”

    C/C++ is compiled – STATICALLY – against a range of CPUs. Java bytecode is loaded at runtime, quickly JIT compiled, and then compiled dynamically as the program “ages.” This means that Java code can not only be compiled against a very specific processor, but it can also optimise things like “branch prediction,” because it can watch the program run for a while before it optimally compiles the code. This is why the “-server” JVM waits until a piece of code has been called at least 10,000 times (by default) before compiling it to machine code.

    One of the main reasons that Java can be faster than C++ in real world applications is that malloc is a very slow way of managing memory. Predictable, yes. Fast, no. Java, by default uses very fast, very low overhead allocation and collection algorithms. Although C/C++ can allocate some memory on the stack, the heap portion kills it. There are alternate memory management schemes for C/C++, and some of these are much faster than the default malloc.

    The bottom line here is that with Java 6, the running code is very close in performance to C/C++. With Java 6u10, the startup penalty has been significantly reduced. I use some very heavyweight Java applications in my day-to-day work. Eclipse (Java on SWT), Oxygen Author (Java Swing), and NetBeans (Java Swing), to name a few. These are seriously complex applications that are quite usable.

    In my current work, we have an application that is composed of about 50 individual projects, and has to be customized at the code level for each of a dozen or so customers. We’d never be able to handle the kind of complexity we have without Java (or without a whole lot more very talented programmers). Every time I have looked at integrating C/C++ code to speed things up, I have found that Java seems to be about the same speed. Why add more complexity?

    “I see no advantage to using java except maybe through pure laziness. With gcc, and sensible coding, it is possible to produce cross platform code relatively easily in C/C++.”

    Take a look at .NET. Microsoft has put many billions behind this Java-like project. There are significant advantages to be gained from the programming model used by Java. But you would have to not be “lazy,” and actually go and study Java and/or .NET with an open mind. :-)