InputStream buffering
On the previous pages, we've seen how to read bytes from an InputStream
and to correctly handle I/O errors while reading. In our examples,
so far, we have been reading byte by byte from the stream. This turns out to be inefficient for many types
of InputStream.
Calling read() for every single byte on a FileInputStream (and many other types
of input streams) means that for every single byte, Java will call a native operating system method.
And calling native methods from Java is often a relatively expensive operation1.
(Most operating systems will do some amount of underlying buffering so that there wouldn't be, for
example, a separate disk read for every single byte read: so in many cases, it is essentially the cost of the OS call
that is the big problem.)
You'll recall that InputStream also provides versions of read() for reading multiple
bytes into an array. Provided that the subclass in question actually optimises these methods (and, for example,
FileInputStream does), then we can make a single OS call to read multiple bytes. So in our example
of checking if a file is a JPEG file, we could create a four-element byte array and fill the array in a single
call, then read the bytes from the array to check if they match the signature of a JPEG file.
In practice, using the multi-byte read() calls in this way isn't always convenient. For one
thing, these calls aren't guaranteed to read the requested number of bytes, even if available. So we still need
to sit in a loop, reading until we have filled the array (even though in practice, we probably would read
the first four bytes in one go).
Buffering with BufferedInputStream
To make life a bit easier, Java provides a "wrapper" input stream called BufferedInputStream.
This is constructed around another base input stream such as FileInputStream, but buffers reads in
the background. That is, we can call the single-byte version of read(), and BufferedInputStream
will behind the scenes read multiple bytes from the file into a buffer and then serve us bytes from the buffer.
This "wrapper" model means that we can add buffering with a single line of code:
public boolean isJpegFile(File f) throws IOException {
InputStream in = new FileInputStream(f);
in = new BufferedInputStream(in);
try {
return (in.read() == 'J' &&
in.read() == 'F' &&
in.read() == 'I' &&
in.read() == 'F');
} finally {
try { in.close(); } catch (IOException ignore) {}
}
}
In practice, it's common to chain the constructors together to make the code a bit more elegant:
in = new BufferedInputStream(new FileInputStream(f));
Should the buffer construction go inside or outside the try/catch block?
Note in the above example that we assume that an error won't occur
constructing the BufferedInputStream after we've created the FileInputStream.
If this really happened, the finally clause wouldn't be executed with the
above structure. However, this course of events is so unlikely that we'd probably live
with it. Note that the file would eventually get closed: FileInputStream
has an implementation of finalize which "in emergencies" performs the close prior
to garbage collection. Note that the most likely cause of being unable to create a
BufferedInputStream is probably an OutOfMemoryError, and if one of
those occurs, you've got much more to worry about that timely closure of a file...
When should you use BufferedInputStream?
As a general rule of thumb, you should always wrap an input stream in a BufferedInputStream
except where you know there's other buffering going on. For example, if you are
calling the multi-byte reads on a FileInputStream and reading a reasonably large number of
bytes (say, a few K) at a time, then adding an extra BufferedInputStream
probably won't give you much benefit and could even slow down your I/O slightly due to the
extra buffer copying. But if you're not sure, the penalty for not buffering is generally much
greater than the penalty of an unnecessary extra layer of buffering.
Java's model of implementing I/O buffering as an extra InputStream in the chain is
genreally quite neat (as you saw above, it means we can generally add I/O buffering to arbitrary
code with a single line of code). A disadvantage is that library methods that say they take an
InputStream for input don't consistently state whether they then buffer the input
or whether they expect the input to be pre-buffered. Similarly, it's not always clear whether
some unspecified flavour of InputStream returned by a method already implements buffering.
If in doubt, buffer.
Configuring the buffer size
When creating a BufferedInputStream, it is possible to specify the buffer size
in bytes. The buffer size can affect both the overall read time (requesting a larger number
of bytes at a time can involve fewer "round trips" to a hard disk or network to request
data) and CPU time (the more data asked for at a time, the less time proportionally is
likely to be spent inside the "housekeeping code" around each data request). It turns out
that the default buffer size is generally a good choice: see here for more information on
choosing an input buffer size with BufferedInputStream.
Notes:
1. There are
some exceptions to this: some very basic native methods such as the various functions in java.lang.Math
actually get converted "directly" into machine instructions by modern VMs.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.