In business and scientific applications hierarchies appear in many scenarios: organizational or financial data, for example, is typically organized hierarchically, while the sciences routinely use hierarchies in taxonomies, say for animal species. In the underlying relational database management systems (RDBMS) they are represented in hierarchical tables using relational tree encodings. Looking at typical queries especially in analytic applications, it is noted that hierarchies serve mainly two purposes. The first is structural pattern matching, i.e., filtering and matching rows based on their positions in a hierarchy. The second is hierarchical computations: propagating measures and performing aggregation-like computations alongside the hierarchical dimension. To address both purposes on RDBMS level, two challenges need to be solved namely (1) how can a user express a task at hand intuitively and concisely in SQL (expressiveness)? and (2) how can the engine process these SQL queries efficiently (efficiency)?
Regarding pattern matching queries, both can be considered adequately solved, as they boil down to straightforward filters and structural joins on hierarchy axes such as “is-descendant”, and techniques for appropriate indexes and join operators are well-studied. The same cannot be said of hierarchical computations. For the purpose of computations, a subset of the hierarchy nodes is dynamically associated with values to be propagated or aggregated, and possibly filtered. In analytic applications, this has always been a routine task: Dimension hierarchies are typically modeled by denormalized leveled tables such as City—State—Country—Continent. Certain computations can then be expressed using SQL's basic grouping mechanisms (in particular ROLLUP). However, this is insufficient for computations beyond simple rollups, especially when the hierarchy is not organized into levels but exhibits an irregular structure—where nodes on a level may be of different types—and arbitrary depth. Consider the hierarchy in diagram 100 of FIG. 1. Suppose it is desired to compute weighted sums of some values attached to the leaves—how could one state a rollup formula incorporating the edge weights? This quickly turns exceedingly difficult in SQL. One tool that comes to mind are recursive common table expressions (RCTEs). However, more intricate aggregation-like computations tend to result in convoluted, inherently inefficient statements. Lacking RDBMS support, today users resort to stored procedures or client code as workarounds. These are unsatisfactory not only concerning expressiveness, they also ignore the known hierarchy structure and are thus handicapped in terms of efficiency.