Introduction to networking in Java
In this section, we will look at how to perform various networking
operations in Java. Various aspects of networked I/O in Java are actually very
similar to Java I/O generally.
Examle: how to download data from a URL
One of the most common networked operations that people want to perform in Java is
to download data from a particular URL. This is generally a straightforward task. In its
simplest form, the general procedure is as follows:
- construct a URL object representing the URL that
data is to be retrieved from;
- call openConnection() on this URL to retrieve a URLConnection object;
- on this connection object, call getInputStream() to get an InputStream object;
- use the InputStream as normal to read data, bearing in mind issues that
you would with other input streams, such as the need for buffering, or character encoding issues
if we're translating the bytes into characters.
Constructing a URL object
We can construct a URL object simply by passing it the string representation
of the URL, as would appear in a browser address bar:
try {
URL ur = new URL("http://www.mydomain.com/myfile.gif");
// do something with the URL...
} catch (IOException ioex) {
...
}
Notice that we catch IOException. Constructing a URL could throw
a type of IOException, specifically MalformedURLException. Since we're likely
to use the URL in order to connect to it— an operation that could also throw
IOExceptions— it's often simpler to just catch any type of IOException
around the whole operation.
Reading binary data from a URL
If the URL points to binary data, such as an image, then we essentially want
to follow the above pattern, but pull out "raw bytes" from the input stream. If we want to get the
bytes into a byte array, then we can use the help of ByteArrayOutputStream. This
class lets us feed it successive bytes, then at the end call toByteArray(). So the
code could look as follows:
public static byte[] getBinaryURLContent(URL url) throws IOException {
URLConnection conn = url.openConnection();
InputStream in = new BufferedInputStream(conn.getInputStream());
try {
ByteArrayOutputStream bout = new ByteArrayOutputStream(10000);
int b;
while ((b = in.read()) != -1) {
bout.write(b);
}
return bout.toByteArray();
} finally {
in.close();
}
}
Notice that:
- once we've called openConnection() and then getInputStream(), we
effectively proceed as though reading from any boring old input stream— at this point, there's nothing
very special about the fact that the stream is coming via a URL connection;
- this means that, like any InputStream, we should buffer
the input (via BufferedInputStream in this case);
- with the buffering in place, we just read one byte at a time from
the (buffered) stream; there may be slightly more optimal ways to read the data (e.g. by reading
an array of bytes each time, we avoid the potential overhead of a method call per byte),
but this simple method is good enough for most purposes;
- as always, we need to close the stream in a finally clause;
- in this simple example, we make a rough guess as to the amount of data we're expecting
(if the server provides it, we can actually query the URLConnection for the
content length).
Closing the URLConnection?
There's a special contract between the InputStream and the underlying URLConnection
that closing one will close the other. So it's sufficient in this case to just close the InputStream.
Reading the contents of a URL as a string (or CharSequence)
How to download the content of a URL to a string is a common situation, and is not
much different to the binary data case just examined. Essentially, we need to read
character by character from the URL stream and append each character to
a string (or in fact, a string buffer of some kind). As of Java 5, we can
use a StringBuilder, which is a non-synchronized StringBuffer.
Apart from the destination of the characters, a key issue is character encoding:
that is, the scheme by which bytes are "mapped" to characters. If we're really lucky,
the server will tell us which encoding it uses, and we can read the name of the scheme
with getContentEncoding(). However, we must be prepared for the possibility that
this method will just return null, in which case we need to make an assumption
of some kind. For simplicity, we'll just assume a default encoding of ISO-8859-1
(another common encoding scheme being UTF-8):
public static CharSequence getURLContent(URL url) throws IOException {
URLConnection conn = url.openConnection();
String encoding = conn.getContentEncoding();
if (encoding == null) {
encoding = "ISO-8859-1";
}
BufferedReader br = new BufferedReader(new
InputStreamReader(conn.getInputStream(), encoding));
StringBuilder sb = new StringBuilder(16384);
try {
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
sb.append('\n');
}
} finally {
br.close();
}
return sb;
}
Note some other points:
- we posed the original problem as needing to "download to a string", but in fact,
there's really no need to return a String: usually, the thing that the caller
of this method will need is just "some character sequence or other", so we may as
well declare the method as returning a CharSequence implementation (and in
fact, just return the StringBuilder that we were writing to)— if the
caller really wants a String, they can soon call toString() on the
CharSequence returned;
- I also choose to use the readLine() method to read line by line,
and then adding a specific line break character (in this case \n)
after each line: this has the effect of normalising line breaks
(depending on the server system and/or file in question, line breaks could be
marked in different ways, but BufferedReader deals with the different
possibilities).
In practice, I've also found readLine() to give slightly better
performance: possibly because the JVM can compile this whole method and avoid a
method call per character.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.