Structured data is often used to store large amounts of data for enterprise-class applications and systems. The structured data often contains data related to an organization, such as the organization's inventory, product catalogs, sales, payroll, employees, accounts, locations, customers, vendors. This structured data is increasingly being stored and manipulated in trees, such as XML trees. Unlike relational databases traditionally used to store and manipulate structured data, trees offer a significant number of advantages. For example, trees can better represent hierarchal relationships and are often more portable across different computer platforms and/or different software systems. Moreover, trees, unlike relational database tables, also facilitate transformation into different data representations, such as an alternative representation of the data in the tree. However, unlike relational databases, trees are often not associated with a database server and lack a standard Application Programming Interface (API), such as SQL, that allows both querying and data manipulation (e.g., updating, deleting, inserting, and transforming) data.
Traditionally, tree processing APIs, are inefficient in meeting programmer's needs, which in turn creates problems for developing enterprise—class applications that use trees as a data source. For example, a dichotomy often exists between querying and transformation of data represented in a tree, even though transformations and queries are often used together. As an example, a single transformation often performs one or more queries while transforming the input tree, such as to find those elements and attributes to include in the transformed tree.
As an example, a number of different XML APIs are available for querying, such as Xquery and Linq for XML, and XML transformation, such as XSLT. These XML APIs are often lazy and impure when performing imperative operations, meaning that undesirable side-effects or obscure problems often occur.
For example, when imperative updates are performed on multiple nodes during a transformation, an instance of the “Halloween” problem occurs. In particular, when the first node is deleted or updated, the internal query for matching nodes ceases because the node list is cut off as seen by the ongoing query. Fortunately, APIs calls, such as Linq for XML's ToList( ) function call, are available to take a snapshot of query results before any updates are performed so that the query can continued to be performed on that copy. However, a programmer needs to explicitly call the API function to allow pure updates and many end-user programmers are unaware of the need as evidenced by numerous bug reports. In addition, creating a snapshot is expensive in performance and memory to perform in every transformation, especially if only a few nodes are updated.
In addition, many tree-based APIs are designed to be in-memory APIs, such as a DOM-based API, and are unable to handle extremely large tree datasets that an enterprise-class application would be expected to handle. Complexity is added to an application when more than one API is used depending on the amount of data to be processed. Accordingly, programmers may choose to use slower non-in-memory APIs, even when the tree being processed is small enough to be manipulated using an in-memory API.
Furthermore, many tree processing APIs are not strongly-typed. As a result, API implementations are often unable to catch errors early or optimize their implementations for the input data.
The above-described deficiencies of tree-processing APIs are merely intended to provide an overview of some of the problems of today's tree-processing APIs, and are not intended to be exhaustive. Other problems with the state of the art may become further apparent upon review of the description of various non-limiting embodiments of the invention that follows.