XML parsing in Java with XPath
In case you're not too familiar with XML, we'll start with a brief overview
of a typical, simple XML document. We're not going to get bogged down with some of
the less common features of XML, but concentrate on the most common ones. Then, as a
basis for parsing the XML document, we're going to use
XPath, which is essentially a scheme for referring to parts of an
XML document as though it were a file system.
Here's a typical, simple, small XML document:
<?xml version="1.0" encoding="iso-8859-1"?>
If you're familiar with HTML, various features of XML will appear familiar.
Here are the main features:
- the document starts with a header, which indicates that an XML
document is to follow, and indicates the character encoding;
- a document consists of a number of nodes, which generally have
a start and end tag;
- each node may have text content (e.g. the text of the maxConnections
node is 100);
- each node may also have a number of attributes (here,
the minConnections has an ignorable attribute whose value
- if a node has no text, its start and end tags may be combined,
as in the extraParams node here.
Compared to typical HTML, XML has a slightly "tighter" format:
- every document must have extractly one root node;
- all nodes must have properly matched tags; it's not possible
to have a series of start tags without end tags, as is typical
with li and p nodes in HTML.
XML documents can have other features that we won't get bogged down
in here, such as namespaces and document type definitions (essentially
a means for a document to be validated when it is read).
As mentioned, XPath is a scheme for accessing parts of an XML document
as though it were a file system. For relatively short documents where you need
"random access" to an XML file, it's usually the most practical means of
parsing the document. (Unfortunately, the XPath implementation of
current releases of Java is slightly buggy in that it performs catastrophically
on large documents.)
As an example, the following XPath expression refers to the text
of the maxConnections node (with the value "100" in this case):
while the following refers to the value of the ignorable attribute
of the minConnections node:
Next: evaluating XPath expressions in Java
Now we've seen the overall principles of XML and XPath, on the next
page, we look at the actual code to evaluate XPath
expressions in Java.