1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for parallel execution of operations for a partitioned binary radix tree on a parallel computer.
2. Description of Related Art
A database is an aggregation of data that has an organized structure in the memory of a computer system. Data in a database is often organized using an index, which is a tree data structure that defines the organization of data in memory in such a way so as to allow for fast searching and dynamic sorting of the data. A database typically includes data structures, called ‘tables,’ that contain records and includes the indexes that define how the records can be accessed by the computer system. Each record includes a key that identifies the record and is capable of being searched for and sorted on. An index provides a logical ordered list of the records in a database by storing key values in the records as entries in the tree data structure implementing the index. A computer system may use the index to facilitate fast searching for a record that has a particular key by looking up the key in the sorted entries of the tree data structure implementing the index.
A tree data structure is typically composed of a plurality of nodes logically connected in a manner that resembles an inverted tree. In many tree data structures, the key values or entries in the tree are stored in the various nodes of the tree. Leaf nodes are nodes in the tree that have no children. By contrast, the root node of the tree is a node in the tree that has no parent. Nodes logically positioned between the root node and the leaf nodes are referred to as limb nodes and have both a parent and a child.
Because tree data structures often store large quantities of data, maintaining an entire tree in a computer system's primary storage is typically unfeasible or impractical. Tree data structures, therefore, are generally partitioned into logical pages. Each logical page is blocks of data that store a sub tree of the nodes in tree data structure. In computer systems that cannot store the entire tree in primary storage, the logical pages are paged as a unit between primary and secondary storage as needed by the computer system. In partitioning a tree, the goal is to minimize the amount of data that must be paged to locate a particular key, thus increasing system performance. The trunk page is the first, or topmost, logical page of a tree. The leaf pages are the bottom-most logical pages in the tree. The limb pages are logical pages between the leaf pages and the trunk page.
Many databases implement indexes using a particular type of tree data structure called a partitioned binary radix tree (‘PBRT’). PBRTs provide a space advantage over many other types of tree data structures because PBRTs only store the leading characters common to multiple entries once in the PBRT. PBRTs are able to store common leading characters only once by encoding the leading characters into the logical connections among the nodes that lead from the root node of the tree to a leaf node pointing to the unique trailing characters of a particular entry. The entries of a PBRT, therefore, are distributed throughout the nodes of the tree instead of being stored in their entirety within a single node.
Because the entries of a PBRT are distributed throughout the tree, searching algorithms in the current art are performed in sequential order from the trunk page to the leaf pages of the PBRT. Such sequential algorithms, however, do not take advantage of the computing resources available using parallel computing. Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. Parallel computing is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination. Because current searching algorithms for PBRTs do not take advantage of the computing resources available using parallel computing, room for improvement exists in the current art.