1. Field of the Invention
The present invention is directed to the field of database management, and, more specifically, to using a compressed data structure to estimate the amount of data processed by a query.
2. Description of the Prior Art
Prior to executing a query, a database management system (DBMS) may determine a “plan” for executing the query in the most efficient manner. To determine the plan, the DBMS estimates the amount of data that will be processed by a query at each stage of the execution. To make such estimation, the DBMS may use a data structure referred to as a “trie.” The trie is a model of a set of strings stored in a collection of data such as, for example, a relational data table. The trie enables the DBMS to quickly determine the number of strings in the collection of data that match a like predicate in a query.
An exemplary conventional trie is shown in FIG. 1. The exemplary trie of FIG. 1 includes the following strings: apple, apply, applying, seated, and seating. As shown, the top node 110 in trie 100, which may be referred to as the “root” node, is empty. The remaining bottom nodes each include a single character. A square node identifies the last letter in each string. Tracing a path from the root node to a corresponding square node and concatenating the characters stored in the rightmost nodes at each level of the path forms each string.
A conventional trie such as trie 100 of FIG. 1 has several drawbacks. Because each node in the trie includes only a single character, the trie may include a large number of nodes that occupy a large amount of memory. Furthermore, character-by-character matching may require a lot of time to perform, thereby delaying query execution. Another drawback is that repetitive suffixes such as “ing”, which is a suffix in both “applying” and “seating”, are identified in the trie multiple times. Such suffix repetition increases the amount of memory required to store the trie and increases the time required to perform matching. Thus, there is a need in the art for a “compressed” trie, in which multiple characters may be stored in a single node. Furthermore, it is desired that repetitive suffixes be identified and eliminated from such a compressed trie.