1. Field of the Invention
The present invention relates to computer-implemented search techniques, more specifically to arrangements for improving search methodologies in dynamically balanced trees.
2. Description of the Related Art
Different computer-based search techniques have been developed in an effort to reduce the time necessary to search for data records within a computerized database. Examples of conventional search techniques include sequential searches of database records in an ordered table, or binary searches of database records based on a key entry that uniquely identifies a database record.
These different computer-based search techniques often use a prescribed index to search for a database record. For example, database records may be indexed according to a tree arrangement to quickly insert and locate data. One limitation of trees is that searching of the tree is optimized on the composition of the xe2x80x9ckeyxe2x80x9d used to identify tree elements. For example the actual data, or an abstraction or subset of the data is used to uniquely identify each element within the tree. Although trees can be built dynamically as a new data entry is added, a disadvantage of trees is that efficiency of a search operation (i.e., minimizing the average number of data records that need to be accessed during a search) is substantially degraded if the data is not evenly distributed within a tree. A reduced efficiency of a search operation can be critical to performance in time-sensitive applications, such as networked switching systems that need to access network address forwarding information in real time.
One aspect of even distribution of data involves whether the tree is balanced structurally. FIGS. 1A and 1B are diagrams illustrating a tree having a balanced structure and a tree having an unbalanced structure, respectively. The balanced tree 10, illustrated as a binary tree, has a group of elements 12 that are evenly distributed throughout the tree 10, enabling searches throughout the tree 10 to be performed efficiently relative to the total number of elements 12; in other words, a search through the tree 10 would require at most 4 accesses, based on the depth of the tree 10, where the number of accesses per searche equals log(n), n being equal to the number of elements 12. FIG. 1B is an example of a tree 14 having an unbalanced structure, resulting in possibly a substantial greater number of accesses per search throughout the tree 14. On average, a relatively larger number of access attempts per search are required to find any given element 12 in the unbalanced tree 14.
A possibly more important aspect of even distribution in a tree involves the distribution of data elements within the tree relative to the frequency in which the data elements are accessed. In particular, the use of trees for searching for a data element provides a reasonable level of efficiency when the data elements in the database have an equal probability of being accessed by a search process. Although this assumption may be true for randomly-distributed groups of data entries such as entries for telephone directories and the like, this assumption is not true for relatively ordered groups of data entries. Use of trees for searching of relatively ordered data entries, or data systems having only a very few heavily-accessed data elements, may cause the tree-type index to have a lumpy distribution.
For example, the problem of lumpy distribution is substantial in network switches and/or network routers that rely on computer addresses in directing traffic through a network. The problems associated with lumpy distribution may be especially acute if heavily trafficked network nodes, for example servers or gateways, are located deep within the tree 14 (e.g., elements 12b and 12c), whereas network nodes having little traffic are located at or near the root of the tree (e.g., element 12a).
Tree rotation techniques are known for reconfiguring the unbalanced tree 14 into a more balanced structure similar to the balanced tree 10 of FIG. 1A. Although such techniques can balance the structure of the tree, the rotation techniques cannot account for the relative amount of traffic for particular nodes of the tree. Thus, the rotation techniques will result in a structurally balanced tree that still contains a lumpy distribution of data elements; hence, search operation efficiency is still degraded due to the large number of searches needed to locate a heavily accessed element, for example a server or gateway, that is located deep within the tree. If a highly sought after element is positioned at the bottom of a balanced tree, resulting in a lumpy distribution, then the average number of searches will trend towards the depth of the tree, reducing the efficiency of the search operation during use of the balanced tree.
There is a need for reducing the adverse effects of trees having an lumpy distribution of elements when searching for a given key entry, without sacrificing a search operation for finding or verifying the existence of a key, without the necessity of constant maintenance of the tree.
There is also a need for an arrangement that enables the bypassing of a tree for identification of the most heavily accessed elements with the minimum number of search operations, regardless of the location of the most heavily accessed elements within the tree.
These and other needs are attained by the present invention, where additional pointers for each data element are generated that enable the data elements to be searched as a linked list in an order based on determined importance values of the elements within the tree. The searching of the data elements indexed according to a linked list, as opposed to the existing tree structure, enables the search engine of a computer-based system to identify heavily-trafficked elements of the tree with a minimal number of searches.
According to one aspect of the present invention, a method is provided in a computer system of searching for a specified key entry. The method includes determining an importance value for each element of a tree structure used for searching for the specified key entry. Each of the elements has a corresponding key entry, and the importance value indicates a first probability that the corresponding element includes the specified entry. The method also includes linking at least a first number of the elements according to an order based on the respective determined importance values, determining an estimated average number of accessed elements per search of at least one of the linked first elements and the tree structure, and searching selected linked first elements for the specified key entry, prior to searching the tree structure. The searching of the selected linked first elements of the tree structure is based on the respective importance values being greater than the estimated average number of accessed elements per search. The linking of the first number of the elements according to an order based on the respective determined importance value enables a search engine to identify the most heavily-searched elements of the tree with a minimum number of searches, based on the determined importance values for each of the linked first elements. In addition, the determination of an estimated average number of accessed elements per search enables the linked first elements to identify a point at which a search process should discontinue searching through the linked first elements and begin searching of the tree structure for the specified key entry.
Hence, the searching for the most heavily-accessed elements can be optimized, without substantially increasing the number of accesses necessary per search for finding an element in the tree structure.
Additional advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.