Hypertext Markup Language (HTML), Standard Generalized Markup Language (SGML), and eXtensible Markup Language (XML) are examples of widely used markup languages. These markup languages are used to ascribe structure to the content of a document through the use of tags or element types. Thus, they are referred to as structured documents. XML has grown in popularity because it allows users to define their own tags and document structures. XML is used to create complex documents and to facilitate data exchange and data connectivity.
Querying markup language data is difficult because it involves the structure (e.g., tags) and the content (e.g., data associated with the tags) of the document. Effective markup language querying necessitates effective processing of both structure and content.
Existing technology maintains the markup language structure. This markup language structure is a node tree structure. The node tree structure can be stored using object database technology or hybridized relational technology. In either implementation, complex node tree structures are stored as objects in a database with pointers to adjacent nodes.
There are problems associated with these technologies. For example, since each node is an object and pointers are used, each search path must be completely traversed. This translates into an excessive search space since all intervening nodes between significant nodes must be read and processed.
In view of the foregoing, it would be desirable to provide improved techniques for processing structural documents. In particular, it would be desirable to provide a technique that does not necessitate traversal of node trees. Ideally, such a technique would support linear processing of content. In addition, such a technique would rely upon indirect or inferred structural processing instead of the explicit structural processing associated with prior art techniques.