The overhead of native calls in Java
Compared to a pure Java method call, calling a user-written native method usually
has a significant overhead. The reasons for this are as much to do with optimisations
that the JVM can't make compared to regular Java methods:
- the JVM can't inline the native method;
- the JVM doesn't know enough about the method to make optimisations
that it could make when compiling a regular Java method (for example,
it has to assume that all of the parameters passed in are always used);
- the JVM can't make other optimisations that it could make if it were
dynamically compiling the code (e.g. compiling a constant parameter is a constant
operand to a machine instruction rather than placing it on the stack and
reading it off again);
- in order to make the call into the DLL or library, the JVM may have to
perform extra work, such as rearranging items on the stack.
What all this boils down to is that as of Hotspot 1.6.0, a call to a native method
takes just over 200 clock cycles. Some more precise timings I made
on a 1.86 GHz Pentium under Windows XP are shown in the following table1.
I took timings for calls to three different static native methods, which took
one, three and five integer parameters respectively. As the figures show, the
majority of the overhead is in the act of making a native call per se rather
than in placing individual parameters on the stack:
|No int parameters|
to native method
|Clock cycles / JNI call|
JNI call overhead under Windows XP
So, how good or bad is 200 clock cycles? Well, for an occasionally-called method
that in turn calls a Windows API call, this overhead of the Java/native interface
is surely negligible. The cases where more consideration is needed are, for example,
methods performing mathematical operations that we might have nativised in the hope
of a speedup. We must take into account, for example, that:
- a typical basic arithmetic operation typically
takes 2 clock cyles or thereabouts2 on Intel hardware;
- in many cases (e.g. a method that performs a simple operation
on its parameters), Hotspot and other modern JVMs do a very effective
job of effectively optimising away the cost of a pure Java method call.
So this means that a native method performing a few simple
operations on its parameters probably won't be worthwhile.
Native methods in the standard libraries
The eagle-eyed will have noticed various native methods in the JDK libraries
that perform relatively simple tasks. For example, ByteBuffer.put() writes
a single byte to memory; we really don't want a 200+ clock cycle overhead to
such a simple method.
For this reason, native methods in the standard library don't necessarily
go through the JNI but can actually be treated specially by the JIT compiler.
For example, under Hotspot (and presumably other good JIT-compiling JVMs),
the various ByteBuffer methods are actually compiled directly
to single machine instructions as appropriate.
1. The native method in question simply returned a constant
value. You should always take timings such as these with appropriate
quantities of salt: they're quite difficult to make reliably. I took reasonable precautions
(taking nanosecond timings of a large number of repeated calls; taking mean measurements
of a number of runs; ignoring measurements while the JVM was "warming up") and encouragingly,
the actual calculated number of cycles/call came extremely close to a whole number
of cycles (for example, in the last case, the actual calculated value was
244.032 clock cycles to 3 decimal places).
2. On modern CPU architectures, the number of clock cycles required by a given
instruction is a little complex because it depends, for example, on how quickly the required
data is made available to the given part(s) of the CPU and on those components becoming
avaialble; these factors in turn depend on surrounding
instructions. But for example, a series of additions on registers can typically
run at the "burst" speed of 2 clock cycles per instruction.