XML parsing in Java with XPath

In case you're not too familiar with XML, we'll start with a brief overview of a typical, simple XML document. We're not going to get bogged down with some of the less common features of XML, but concentrate on the most common ones. Then, as a basis for parsing the XML document, we're going to use XPath, which is essentially a scheme for referring to parts of an XML document as though it were a file system.

XML overview

Here's a typical, simple, small XML document:

<?xml version="1.0" encoding="iso-8859-1"?>
<configuration>
  <maxConnections>100</maxConnections>
  <minConnections ignorable="true">10</minConnections>
  <extraParams/>
</configuration>

If you're familiar with HTML, various features of XML will appear familiar. Here are the main features:

the document starts with a header, which indicates that an XML document is to follow, and indicates the character encoding;
a document consists of a number of nodes, which generally have a start and end tag;
each node may have text content (e.g. the text of the maxConnections node is 100);
each node may also have a number of attributes (here, the minConnections has an ignorable attribute whose value is true);
if a node has no text, its start and end tags may be combined, as in the extraParams node here.

Compared to typical HTML, XML has a slightly "tighter" format:

every document must have extractly one root node;
all nodes must have properly matched tags; it's not possible to have a series of start tags without end tags, as is typical with li and p nodes in HTML.

XML documents can have other features that we won't get bogged down in here, such as namespaces and document type definitions (essentially a means for a document to be validated when it is read).

XPath

As mentioned, XPath is a scheme for accessing parts of an XML document as though it were a file system. For relatively short documents where you need "random access" to an XML file, it's usually the most practical means of parsing the document. (Unfortunately, the XPath implementation of current releases of Java is slightly buggy in that it performs catastrophically on large documents.)

As an example, the following XPath expression refers to the text of the maxConnections node (with the value "100" in this case):

/configuration/maxConnections/text()

while the following refers to the value of the ignorable attribute of the minConnections node:

/configuration/minConnections/@ignorable

Next: evaluating XPath expressions in Java

Now we've seen the overall principles of XML and XPath, on the next page, we look at the actual code to evaluate XPath expressions in Java.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey