This invention relates to searching of data structures.
The Extensible Markup Language (XML) is a subset of the Standard Generalized Markup Language (SGML) and its goal is to enable generic SGML to be served, received, and processed on the Internet. XML has been designed for ease of implementation and for interoperability with both SGML and the Hyper Text Markup Language (HTML). Therefore, in one aspect, XML can be thought of as a serialization format or transfer syntax. In another aspect, the emphasis is shifted from the textual aspects of XML to the structural aspects of XML. The layering of XML technologies allows the XML structure to be used as a common language by disparate software agents, such that the software agents can exchange information and instructions across language, process, host, and vendor boundaries.
Entities are the basic components of XML. One or more entities can form a single structured data object, which is referred to as an XML document. An XML document has both a physical and a logical structure. The physical structure is the collection of entities, which typically corresponds to files or network messages. The logical structure is defined by a document information item. The document information item acts as a root of a hierarchical tree of information items. Examples of information items include processing instruction information items, comment information items and character information items. Each information item has one or more named properties. Two of these properties serve to make the hierarchical structure explicit: a parent property, which references an information item at a higher level in the hierarchical tree, and a children property which references an ordered collection of immediate descendents in the hierarchical tree of information items. A software module called an XML processor reads XML documents and provides access to their content and structure. The XML processor typically gets instructions from a computer program application to read available XML data. The XML processor then and provides the read XML data to the requesting application.
XML schemas define vocabularies that can be used to describe XML documents. XML schemas are themselves XML documents that can be parsed and generated using the same technologies that are used to generate the XML documents they describe. The XML schema specification assumes that at least two XML documents are in use: an instance document and a schema document. The instance document contains the actual information of interest, and the schema document contains the structure and type of the instance document, and can thus be thought of as a “grammar” for the XML document. The distinction between schema and instance is similar to the distinction between class and object in object-oriented programming.
One of the more useful components of schemas is type definitions. XML schemas support two categories of types: simple types and complex types. Simple types are represented purely as text strings, while complex types describe the children and attributes of elements in an instance document. The schemas can therefore define allowable types of nodes at given levels in the hierarchical tree that represent the logical structure of the document, and relations between nodes in the hierarchical tree. Furthermore, XML schemas are highly adaptable to domains beyond XML, such as database and object technologies.
The hierarchical relationship between the information items of a given XML document lends itself to formally addressing subsets of the XML document (for example “give me all the child elements named bob whose id attribute is not id-xyz”). XPath expressions provide a simple text-based addressing language that captures the transversal of a document in a programming language-neutral fashion. The XPath expressions operate on the abstract, logical structure of an XML document, rather than the XML document's surface syntax. The XPath syntax is very similar to syntaxes used to traverse file systems or other hierarchical structures. For example the XPath expression “/guitars/guitar/model” locates all model elements that are children of guitar elements, which are themselves children of the root node guitars in some arbitrary XML document. In addition to its use for addressing nodes, XPath expressions can also be used for matching, i.e., testing whether a node matches one or more particular criteria. The searches can be made rather elaborate in order to precisely identify document subsets of interest. Further details and examples about XPaths and searches can be found in “XML Path Language (XPath), version 2.0 W3C working draft,” by W3C® (MIT, INRIA, Keio), 22 Aug. 2003, which is incorporated herein by reference in its entirety (see http://www.w3.org/TR/xpath20/). Typically, when using XPath to test whether a node matches one or more particular criteria in an instance of an XML document, the entire tree of nodes is searched, which may be a time consuming task, particularly for large trees.