The overhead of native calls in Java

Compared to a pure Java method call, calling a user-written native method usually has a significant overhead. The reasons for this are as much to do with optimisations that the JVM can't make compared to regular Java methods:

What all this boils down to is that as of Hotspot 1.6.0, a call to a native method takes just over 200 clock cycles. Some more precise timings I made on a 1.86 GHz Pentium under Windows XP are shown in the following table1. I took timings for calls to three different static native methods, which took one, three and five integer parameters respectively. As the figures show, the majority of the overhead is in the act of making a native call per se rather than in placing individual parameters on the stack:

No int parameters
to native method
Clock cycles / JNI call
1234
3239
5244
JNI call overhead under Windows XP

So, how good or bad is 200 clock cycles? Well, for an occasionally-called method that in turn calls a Windows API call, this overhead of the Java/native interface is surely negligible. The cases where more consideration is needed are, for example, methods performing mathematical operations that we might have nativised in the hope of a speedup. We must take into account, for example, that:

So this means that a native method performing a few simple operations on its parameters probably won't be worthwhile.

Native methods in the standard libraries

The eagle-eyed will have noticed various native methods in the JDK libraries that perform relatively simple tasks. For example, ByteBuffer.put() writes a single byte to memory; we really don't want a 200+ clock cycle overhead to such a simple method.

For this reason, native methods in the standard library don't necessarily go through the JNI but can actually be treated specially by the JIT compiler. For example, under Hotspot (and presumably other good JIT-compiling JVMs), the various ByteBuffer methods are actually compiled directly to single machine instructions as appropriate.


1. The native method in question simply returned a constant value. You should always take timings such as these with appropriate quantities of salt: they're quite difficult to make reliably. I took reasonable precautions (taking nanosecond timings of a large number of repeated calls; taking mean measurements of a number of runs; ignoring measurements while the JVM was "warming up") and encouragingly, the actual calculated number of cycles/call came extremely close to a whole number of cycles (for example, in the last case, the actual calculated value was 244.032 clock cycles to 3 decimal places).
2. On modern CPU architectures, the number of clock cycles required by a given instruction is a little complex because it depends, for example, on how quickly the required data is made available to the given part(s) of the CPU and on those components becoming avaialble; these factors in turn depend on surrounding instructions. But for example, a series of additions on registers can typically run at the "burst" speed of 2 clock cycles per instruction.