Conventional computer-networking environments support the exchange of information and data between many interconnected computer systems using a variety of mechanisms. Extensible markup language (XML) encoded data is now in widespread use for data transfer and representation in such systems. One example of a conventional information exchange system that operates between computer systems over a computer network such as the Internet is provided by a set of applications and protocols collectively referred to as the World Wide Web. In a typical conventional implementation of the World Wide Web, client computer systems operate a client software application referred to as a web browser. A typical web browser operates to provide hypertext transport protocol (HTTP) requests for documents, referred to as “web pages,” over the computer network to web server computer systems. A web server software application operating in the web server computer system can receive and process an HTTP web page request and can return or “serve” a corresponding web page document or file specified (i.e., requested) in the client request back to the requesting client computer system over the computer network for receipt by the client's web browser. The web page is typically formatted in a markup language such as the hypertext markup language (HTML). Data exchanged between clients and servers may also be formatted in other markup languages, such as the extensible markup language (XML) or in a combination of markup languages that allows the one computer system to receive and interpret the data encoded with the markup language information within the document in order to process a response.
In addition to simply accessing web pages, more recent conventional software and networking technologies that work in conjunction with protocols such as HTTP provide complete networked or web-based “applications” or services, sometimes referred to as “web services”, over a computer network such as the Internet. Conventional web services architectures allow server-to-server connectivity, exchange and processing of data for business or other applications. Presently, there is a convergence to the use of XML to encode data that is exchanged between network-based server applications such as the world-wide-web, web services, or other network-based applications since XML is extensible and flexible and can be used to encode data of any type.
Conventional XML processing technologies that operate within computer systems generally rely on software processing to allow the computer systems (e.g., web servers) to interpret and process the XML-encoded data in a variety of ways. Several conventional XML technologies allow a software application to access (e.g., extract) XML-encoded data for application processing purposes. As an example, a web server can use conventional XML software processing technologies such as the Document Object Model (DOM) to convert XML files or documents into a DOM “tree” that allows a software application to access certain portions of the XML encoded data.
Other conventional XML processing technologies include the Simple Application programming interface for XML (SAX) to parse XML encoded data (referred to sometimes as XML documents) to gain access to the XML data. In addition, other XML-related technologies such as the eXtensible Stylesheet Transformation Language (XSLT) allow a developer of an XML-aware software application to define transformations of XML encoded data from one data format to another. Extensible Stylesheet Transformations (XSLT) is a language for converting, or transforming, documents written in XML into other formats, including HTML and other XML vocabularies. An XSL document is used to transform an XML document, or a portion of data contained in such a document, from one format to another (e.g., XML to HTML). A schema is a description in a meta-language specifying the acceptable syntax of an XML vocabulary. A schema document is used to validate an XML document and guarantee its syntax is correct. A filter is an XSLT document used to produce a decision on the acceptability of an input XML document based on an arbitrary set of criteria. A filter verifies an input document based on semantic or other content (transformed or not transformed) not typically related to syntax, and so differs from a schema validation in this way.
Other conventional tools allow markup language data, such as XML encoded data, to be used by software. To do so, the XML data must be parsed. Parsing applies a set of rules to the XML encoded data input stream and generates output removing delimiting characters and creates tokens representing the XML elements. As noted above, a common representation of a group of XML tokens is a tree structure. To extract various portions of the tree for processing and output, a system such as the DOM can support a tree oriented search language. One such conventional search language is specified by the World Wide Web consortium (W3C) and is referred to as XPATH and is defined in the W3C XPATH specification. This XPATH specification defines a grammar that allows the selection of portions of an XML token tree. Most conventional implementations that use XPATH to access XML elements copy all the XML tokens into memory and build a static tree structure. A software application that runs an XPATH expression on the tree to extract subsets of the tree.
There is another conventional method of extracting data corresponding to the XPATH expression. This other method examines XML tokens as the XML data “streams” through an XPATH evaluator. No tree is constructed. Rather, the stream is examined by the XPATH expression evaluator and, if there's a match, a portion of the stream is rerouted to the application. The remainder is discarded.
An XPATH expression can found or contained in an XQUERY language statement or may be embedded within an XSLT document. The following is an XSLT fragment that generates the XPATH expression:
/pub[year<2000]/book[author]/name/text( )
<xsl:for-each select=“pub”><xsl:variable name=“foo” select=“year<2000”/><xsl:if test=“$foo”><xsl:variable name=“blah” select=“author”/><xsl:for-each select=“book”><xsl:if test=“$blah”><xsl:value-of select=“name/text( )”/></xsl:if></xsl:for-each></xsl:if></xsl:for-each>
The latest XPATH specification can be found at http://www.w3.org./TR/xpath, the contents of which are hereby incorporated by reference in their entirety, including XPath 2.0 draft specifications. XQUERY and XSLT are also defined as W3C standards that use XPATH expressions.
Another example of XML processing is schema validation. Schema defines the structure and allowed values of an XML document. DTD is one example. XML Schema is defined by W3C and the latest specification can be found at http://www.w3.org/TR/xmlschema-1/ and http://www.w3.org/TR/xmlschema-2/, the contents of which are hereby incorporated by reference in their entirety. OASIS has defined RELAX NG, a different way of specifying a schema, which in turn can be found at http://www.oais-open.org/committees/relax-ng/spec-20011203.html, the contents of which are hereby incorporated by reference in their entirety. There have been and may exist other means of specifying schema structure, but they all share the same goal and must perform similar XML processing operations.
A conventional XPATH expression is composed of a location path constructed of one or more location steps. A location step is composed of an axis, node test and predicate. In the example /navy/battleship there are two location steps: navy and battleship. In this example, the axis defaults to child:: and the predicate defaults to null. A richer example is as follows:/pub[year<2000]//book[author]/page/12/text( )The above translates into a query that returns the text nodes who have a parent of 12, page and book—which must have an author child and must be a descendant of pub nodes that have year child that is less than 2000.
The axis component of a location step defines the “direction” that needs to be examined from the current context node. Examples of axis include child:: (abbreviated /) and descendant-of:: (abbreviated //).
Axis fall into two categories: forward and reverse. Forward axis refer to nodes in the forward document order from the context of the current node. Reverse axis are the opposite. An example of a reverse axis is parent:: and an example of a forward axis is child::.
An XPATH expression can define or include a test. For example, a “node test” selects a group of nodes within a hierarchical arrangement of nodes, such as a tree, for the XPATH expression. The nodes can be explicitly specified (/navy/battleship) or indirectly specified. Indirect node select can use the text( ) node test which will return all the children nodes that are text nodes. A predicate is a Boolean test in the location step. A predicate has the syntax of [value1 op value2] where value1, value2 are XPATH expressions and op is a simple Boolean operator. Both nodes and attributes can be values in the predicate. An example is as follows:/pub[year<2000]//book[author]/page/12/text( )
In the first location step, / is shorthand for the child:: axis, pub is the node test and [year<2000] is the predicate. In the second location step // is short hand for the descendant-of:: axis, the node test is book and [author] is the predicate.
FIG. 1 illustrates how XML elements may be organized in a tree structure. Using the sample tree in FIG. 1, the expression:/pub[year<2000]//book[author]/page/12/text( )returns a nodeset of three Para elements. The solid lines represent the path and the dashed lines represent predicate evaluation. Conventional XPATH implementations are limited to operation and XPATH expression evaluation and application against XML data within software applications that execute in conjunction with an operating system in a computerized device.