Home  XML introduction

Search this site:
Threads Database Profiling Regular expressions Random numbers Compression Exceptions C Equivalents in Java
 Got a question about Java? Java discussion forum

XML parsing in Java with XPath

In case you're not too familiar with XML, we'll start with a brief overview of a typical, simple XML document. We're not going to get bogged down with some of the less common features of XML, but concentrate on the most common ones. Then, as a basis for parsing the XML document, we're going to use XPath, which is essentially a scheme for referring to parts of an XML document as though it were a file system.

XML overview

Here's a typical, simple, small XML document:

<?xml version="1.0" encoding="iso-8859-1"?>
<configuration>
  <maxConnections>100</maxConnections>
  <minConnections ignorable="true">10</minConnections>
  <extraParams/>
</configuration>

If you're familiar with HTML, various features of XML will appear familiar. Here are the main features:

  • the document starts with a header, which indicates that an XML document is to follow, and indicates the character encoding;
  • a document consists of a number of nodes, which generally have a start and end tag;
  • each node may have text content (e.g. the text of the maxConnections node is 100);
  • each node may also have a number of attributes (here, the minConnections has an ignorable attribute whose value is true);
  • if a node has no text, its start and end tags may be combined, as in the extraParams node here.

Compared to typical HTML, XML has a slightly "tighter" format:

  • every document must have extractly one root node;
  • all nodes must have properly matched tags; it's not possible to have a series of start tags without end tags, as is typical with li and p nodes in HTML.

XML documents can have other features that we won't get bogged down in here, such as namespaces and document type definitions (essentially a means for a document to be validated when it is read).

XPath

As mentioned, XPath is a scheme for accessing parts of an XML document as though it were a file system. For relatively short documents where you need "random access" to an XML file, it's usually the most practical means of parsing the document. (Unfortunately, the XPath implementation of current releases of Java is slightly buggy in that it performs catastrophically on large documents.)

As an example, the following XPath expression refers to the text of the maxConnections node (with the value "100" in this case):

/configuration/maxConnections/text()

while the following refers to the value of the ignorable attribute of the minConnections node:

/configuration/minConnections/@ignorable

Next: evaluating XPath expressions in Java

Now we've seen the overall principles of XML and XPath, on the next page, we look at the actual code to evaluate XPath expressions in Java.

 Did this article answer your question? If not, visit the new Javamex discussion forums to ask your question.


Unless otherwise stated, the Java programming articles and tutorials on this site are written by Neil Coffey. Suggestions are always welcome if you wish to suggest topics for Java tutorials or programming articles, or if you simply have a programming question that you would like to see answered on this site. Most topics will be considered. But in particular, the site aims to provide tutorials and information on topics that aren't well covered elsewhere, or on Java performance information that is poorly described or understood. Suggestions may be made via the Javamex blog (see the site's front page for details).
Copyright © Javamex UK 2009. All rights reserved.