Programming languages continue to evolve to facilitate specification by programmers as well as efficient execution. In the early days of computer languages, low-level machine code was prevalent. With machine code, a computer program or instructions comprising a computer program were written with machine languages or assembly languages and executed by the hardware (e.g., microprocessor). These languages provided a means to efficiently control computing hardware, but were very difficult for programmers to understand and develop sophisticated logic. Subsequently, languages were introduced that provided various layers of abstraction. Accordingly, programmers could write programs at a higher level with a higher-level source language, which could then be converted via a compiler or interpreter to the lower level machine language understood by the hardware. Further advances in programming have provided additional layers of abstraction to allow more advanced programming logic to be specified much quicker then ever before. However, these advances do not come without a processing cost.
Compilers and/or interpreters bear the burden of translating high-level logic into executable machine code. In general, a compilers and/or interpreters are components that receive a program specified in a source programming language (e.g., C, C#, Visual Basic, Java . . . ) and covert the logic provided thereby to machine language that is executable by a hardware device. However, the conversion need not be done verbatim. In fact, conventional compilers and/or interpreters analyze the source code and generate very efficient code. For example, programmers write code that sets forth a logical flow of operations that is intuitive and easy for humans to understand, but is often inefficient for a computer to execute. Compilers and/or interpreters can identify inefficiencies and improve program performance at the hardware level by eliminating unnecessary operations and/or rearranging the execution of instructions but still achieving the intended results. In this manner, programmers can create robust and efficient software programs.
Extensible Markup Language (XML) has become quite a popular programming language. XML is a markup language that provides a format for describing structured data. Similar to HTML (Hyper Text Markup Language), XML is a tag-based language that defines a strict tree structure or hierarchy. XML is a derivative of Standard Generalized Markup Language (SGML) that provides a format for describing and exchanging structured data in an open text based format. Unlike HTML, which is a display-oriented language, XML is a general-purpose language for representing structured data without including information pertaining to how it is to be displayed. XML consists of elements, and attributes, among other things.
XML elements are structural constructs that consist of a start tag, an end or close tag, and the information or content that is contained between the tags. A start tag is formatted as “<tagname>” and an end tag is formatted as “</tagname>.” In an XML document, start and end tags can be nested within other start and end tags. All elements that occur within a particular element must have their start and end tags occur before the end tag of that particular element. This defines a strict tree-like hierarchical structure. Each element forms a node in this tree, and potentially has child or branch nodes. The child nodes represent any XML elements that occur between the start and end tags of the parent node.
XML can accommodate an infinite number of database schemas. Within each schema, a dictionary of element names is defined. The dictionary of element names defined by a schema is referred to as a namespace. Within an XML document, element names are qualified by namespace identifiers. When qualified by a namespace identifier, a tag name appears in the form “[namespace]:[tagname]”. This model enables the same element name to appear in multiple schemas, or namespaces, and for instances of these duplicate element names to appear in the same XML document without colliding. Start tags can declare an arbitrary number of attributes, which declare property values associated with the element being declared. Attributes are declared within the start tag using the form “<[tagname] [attribute1],[attribute2] . . . , [attributeN]>”, where an attribute1 through attributeN are declarations of an arbitrary number of tag attributes. Each attribute declaration is of the form “[attributeName]=[attributeValue]” where each attribute is identified by a unique name followed by an “=” character, followed by the value of the attribute.
To facilitate such interaction with XML documents via high-level object oriented programming languages, for instance, an XML Application Program Interface (API) can be employed. An XML API provides a set of rules or a protocol to enable communication between an XML document and client application. There are a number of different types. For example, there are push APIs like SAX (Simple API for XML) where an API parser serializes a document and pushes the parsed data to a client application. Another type of XML API is a tree-based API such as DOM (Document Object Model). With this type of API, an XML document is parsed and an object model comprising a tree or hierarchy of nodes for XML elements, attributes, etc is constructed and housed in memory. Methods can then be directed at the object model for retrieving, modifying and adding data to the document. There are also query-based APIs. With these, XML documents are searched and data is returned in accordance with the search utilizing XPath (XML Path Language), for example. XPath is a language that locates and addresses information in an XML document by navigating through its elements and attributes. XPath queries are specified in terms of paths and expressions. In essence, XPath is incorporated into a programming application (e.g., C#, Java, Visual Basic . . . ) just like SQL (Structured Query Language). A user can query a database from within an application by specifying SQL queries. A database management system will receive these queries and return the results. Likewise, an XPath query can be specified within an application, and an XPath engine will process the query against an XML document and return requested information.
Conventional APIs for processing XML are notoriously difficult to use. They provide a document object model over data in the XML document utilizing types such as XmlNode and XPathNavigator. To navigate a typical XML document object model, a user has to deal with several classes whose interaction is often complex and counter intuitive. For example, the System.Xml.XPath namespace contains a variety of classes that allow users to load an XML document, to navigate toward specific nodes using XPath queries represented by plain strings and then imperatively iterate through these nodes to pick out values that a user is ultimately interested in retrieving. The iterator iterates over a collection of navigators each navigator has the Select( ) method that returns an iterator over the new set of navigators that points to the nodes that were selected.