A large database can include voluminous amounts of data organized as columns and rows within tables of the database. Sometimes, specific types of summary data need to be calculated for the entire database. When a summary operation is executed, the performance of the database can be negatively impacted as the summary operation consumes resources used by the database. Thus, it is desirable to select a summary operation that minimizes its impact to the database when processing.
Accordingly, prior to processing, the data included within the database may often be efficiently viewed as a tree data structure to improve processing throughput when an operation is performed against the entire database. The tree data structure can be derived from relationships included within the data. For example, each cell of a database includes a row name and a column name; however, each cell can also include a right child link, a left child link, and a parent link. If a cell has a null parent link, then it is deemed the root of the tree. If a cell has one or more null children links, then it is deemed a leaf node of the tree. Moreover, a cell is often referred to as a node, and the node can be designated as a parent node, a child node, a sibling node, a root node, and/or a leaf node.
Once the database is viewed and processed as a tree data structure, a proper selection of a tree summary operation that will efficiently traverse the tree is important. An incorrect selection of a tree summary operation can substantially increase resource (e.g., processor and memory) consumption and thereby adversely impact the processing throughput of the database.
Conventionally, the selection of a tree summary operation is done when the database is first designed, and the summary operation remains in use regardless of the changed state of the tree. As a result, a poor initial choice of a tree summary operation can remain problematic until manual analysis is done to select a better performing tree summary operation. Moreover, any subsequent change to add a new tree summary operation is often done in an ad hoc manner or through manual analysis, and although the subsequent selection may be an improvement over the initial tree summary operation selection, there are no automatic techniques that are used to select the most efficient tree summary operation.
As a result, present techniques rely on chance and manual analysis to improve the operational performance of any selected tree summary operation. Furthermore, in some cases, additional data structures may need to be implemented in order to support a particular tree summary operation. For example, one tree summary operation operates and maintains a tree summary by keeping track of summaries at various levels within the tree. When a lower level summary is changed than all the summaries at a higher level are updated to reflect the lower level summary change. In other cases, no additional data structures are needed at all, since the tree summary operation traverses the entire tree visiting each node of the tree when calculating a tree summary. In still other cases, summaries are stored separately from the tree and when a particular node changes, the stored summaries are updated to reflect the changed node. Further, such changes to a working database after initial loading can be extremely time-consuming and disruptive to productive use of a system.
Thus, even when a new tree summary operation is desired because the performance of an existing tree summary operation is not acceptable, it may still be time consuming and expensive to implement a new tree summary operation. This is so, because additional data structures may need to be integrated into the original database design in order to support a new tree summary operation.
As is apparent, there exists a need for improving techniques that identify and select optimal tree summary operations for a tree, where the tree is a logical representation of a database. Moreover, there exists a need to derive an algorithm that can be used to predict the performance of one or more tree summary operations, when attributes of the tree change.