A parsing scheme takes a document and returns records. A record is a series of values. Each value corresponds to a field, which has a name and a type. An example of a record is {“blue”, 19, “$0.50”}. The corresponding fields for that record may be a string field named color, an integer field named quantity, and a price field named price.
XML is a standard text markup language. An XML document has a hierarchical organization, so the document can be viewed as a “tree”. For example, consider the following XML document:
<html><table><tr><td>Price</td><td>Quantity</td></tr><tr><td>$5</td><td>10</td></tr><tr><td>$10</td><td>5</td></tr><tr><td>$12</td><td>7</td></tr></table></html>The document can be viewed as a heirarchical tree, as follows:
html|- table|- tr|  |- td - Price|  |- td - Quantity|- tr|  |- td - $5|  |- td - 10|- tr|  |- td - $10|  |- td - 5|- tr|- td - $12|- td - 7XML information and the XML spec can be found at http://www.w3c.org/XML/. XPath is a query language used to specify portions of the XML tree. The standard for XPath is available at http://www.w3.org/TR/xpath. Xpath is used to refer to specific nodes or sets of nodes from the XML document. Unless otherwise stated in this document, an Xpath refers to a set of nodes, rather than a specific node. This document uses the terms simple Xpath and complex Xpath, with the following definitions:                Simple Xpath—A simple Xpath is one which has only node names and ‘/’ characters separating them, such as “/html/table/tr”.        Complex Xpath—A complex Xpath has parametric information, such as “only get tables which contain a TR”, expressed as “/html/table[tr]”.For example, in the sample XML document, the simple Xpath “/html/table/tr” refers to the tr nodes. In the sample XML document, the complex Xpath “/html/table/tr[1] refers to the first tr node.        
A parser based on Xpath may employ the concepts of “row Xpath” and “column Xpath”. A row Xpath indicates a method to parse record regions from an XML document. A column Xpath indicates a method to parse fields from each record region. For example, for the sample document, a parser with row Xpath “/html/table/tr” produces record regions:                <tr><td>Price</td><td>Quantity</td></tr>        <tr><td>$5</td><td>10</td></tr>        <tr><td>$10</td><td>5</td></tr>        <tr><td>$12</td><td>7</td></tr>If the parser has column Xpaths “td[1]/text( )” (corresponding to a price field) and “td[2]/text( )” (corresponding to a quantity field), then the parsed records are:        {“Price”, “Quantity”}        {“$5”, “10”}        {“$10”, “5”}        {“$12”, “7”}        