A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates generally to information processing environments and, more particularly, to a database management system (DBMS) having a methodology for providing compact B-Tree indexes.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from, or updated in, such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., xe2x80x9cAn Introduction to Database Systems, Volume I and II,xe2x80x9d Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a decentralized or distributed environment. One or more PC xe2x80x9cclientxe2x80x9d systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these xe2x80x9cclient/serverxe2x80x9d systems include Powersoft(copyright) clients connected to one or more Sybase(copyright) SQL Anywhere(copyright) Studio (Adaptive Server(copyright) Anywhere) database servers. Both Powersoft(copyright) and Sybase(copyright) SQL Anywhere(copyright) Studio (Adaptive Server(copyright) Anywhere) are available from Sybase, Inc. of Dublin, Calif.
In today""s computing environment, database technology can be found on virtually any device, from traditional mainframe computers to cellular phones. Sophisticated applications, whether human resources information systems or sales force automation systems, can xe2x80x9cpushxe2x80x9d much of their complexity into the database itself. Indeed, this represents one of the main benefits of database technology. The challenge, however, is to support these applications, and the complex queries they generate, on small computing devices. At the same time, users expect the productivity and reliability advantages of using a relational DBMS.
Consider, for instance, the execution of a request for information from a relational DBMS. In operation, this request is typically issued by a client system as one or more Structured Query Language or xe2x80x9cSQLxe2x80x9d queries for retrieving particular data (i.e., data records meeting the query condition) from database tables on a server. For example, the following simple SQL SELECT statement results in a list of the names of those employees earning $10,000, where xe2x80x9cemployeesxe2x80x9d is a table defined to include information about employees of a particular organization:
SELECT name
FROM employees
WHERE sal=10,000
The syntax of SQL is well documented, see e.g., the abovementioned xe2x80x9cAn Introduction to Database Systems.xe2x80x9d For further information on SQL, see e.g., xe2x80x9cInformation Technologyxe2x80x94Database languagesxe2x80x94SQL,xe2x80x9d published by the American National Standards Institute as American National Standard ANSI/ISO/IEC 9075: 1992, the disclosure of which is hereby incorporated by reference.
For enhancing the speed in which the DBMS stores, retrieves, and presents particular data records, the DBMS usually maintains one or more database indexes on a database table. A database index, typically maintained as a B-Tree (or B+-Tree) data structure, allows the records of a table to be organized in many different ways, depending on a particular user""s needs. An index may be constructed as a single disk file storing index key values together with unique record numbers. The index key values are a data quantity composed of one or more fields from a record which are used to arrange (logically) the database file records in some desired order (index expression). The record numbers are unique pointers or identifiers to the actual storage location of each record in the database file. Both are referred to internally by the system for locating and displaying records in a database file.
Searching for a particular record in a B-Tree index occurs by traversing a particular path in the tree. To find a record with a particular key value, one would maneuver through the tree comparing key values stored at each node visited with the key value sought. The results of each comparison operation, in conjunction with the pointers stored with each node, indicate which path to take through the tree to reach the record ultimately desired. Ultimately, a search will end at a particular leaf node which will, in turn, point to (i.e., store a pointer to or identifier for) a particular data record for the key value sought. Alternatively, the leaf nodes may for xe2x80x9cclustered indexesxe2x80x9d store the actual data of the data records on the leaf nodes themselves.
An index allows a database server to find and retrieve specific rows much faster than it could without using the index. A sequential or linear scan from the beginning of a database table, comparing each record along the way, is exceedingly slow compared to using an index. There, all of the blocks of records would have to be visited until the record sought is finally located. For a table of even moderate size, such an approach yields unacceptable performance. As a result, virtually all modern-day relational database systems employ B-Tree indexes or a variant.
General techniques for the construction and operation of B-Trees are well documented in the technical, trade, and patent literature. For a general description, see Sedgewick, R., xe2x80x9cAlgorithms in C,xe2x80x9d Addison-Wesley, 1990. For a survey of various B-Tree implementations, see Comer, D., xe2x80x9cThe Ubiquitous B-Tree,xe2x80x9d Computing Surveys, Vol. 11, No. 2, June 1979, pp. 121-137. For further description of B-Tree indexes, see e.g., commonly-owned U.S. Pat. No. 6,363,376 titled xe2x80x9cDatabase system providing methodology for enhancing concurrency using row update bit and deferred locking.xe2x80x9d The disclosures of each of the foregoing references are hereby incorporated by reference.
Many B-Tree variants have been introduced in the literature and in practice. These can be classified according to the technique used for searching within a page. Most implementations are comparison-based in which searching is typically done by performing a binary search on a sorted array of keys. A variety of optimizations have been developed to improve upon this basic scheme. For an overview of such optimizations, see e.g., Graefe, G. and Larson, P. A., xe2x80x9cB-Tree Indexes and CPU caches,xe2x80x9d in 17th International Conference on Data Engineering (ICDE), pages 349-358, Washington-Brussels-Tokyo, April 2001, published by IEEE Computer Society Press.
Radix-based B-Tree variants are less common, with the majority intended for text indexing, not on-line transaction processing (OLTP). These radix-based variants can be further categorized as to whether they manipulate the on disk representation directly or not. Examples in the first group include the string B-Tree (see e.g., Ferragina, P. and Grossi, R. xe2x80x9cThe string B-Tree: A new data structure for string search in external memory and its applications,xe2x80x9d Journal of the ACM, 46(2): 236-280, 1999) and the string R-Tree (see e.g., Jagadish, H. V., Koudas, N., and Srivastava, D., xe2x80x9cOn effective multi-dimensional indexing for strings,xe2x80x9d in volume 29, pages 403-414 of the Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Tex.). They store a compressed linearization of the search structure on disk and reconstruct it on demand. A disadvantage of this approach, particularly in an OLTP environment, is that it complicates buffer management: the reconstructed search structure likely will not fit on a single page. Further, this approach does not solve the problem of constructing a cache-efficient in-memory representation. An example of a radix-based B-Tree variant that manipulates the on-disk representation directly is Ferguson""s Bit-Tree (see e.g., Ferguson, D. xe2x80x9cBit-Tree, a data structure for fast file processing,xe2x80x9d Communications of the ACM, 35(6): 114-120, 1992). Other examples in this group include the pkB-Tree (see e.g., Bohannon, P., McIlroy, P., and Rastogi, R., xe2x80x9cMain-Memory index structures with Fixed-Size partial keys,xe2x80x9d volume 30, 2 of SIGMOD Record, pages 163-174, ACM Press), a Bit-Tree refinement suitable for main memory databases.
Although B-Tree indexes are widely used to improve DBMS performance, they add overhead to the DBMS as a whole and, therefore, these indexes need to be carefully structured and used to maximize system performance, especially when indexing long values. One approach to indexing long values is to store partial key information: keys are represented with a small normalized prefix together with an identifier (ID) of the row containing the key (if needed). If the prefix alone is not sufficient to resolve a comparison, a full compare is performed against the values in the underlying row, an expensive proposition if a cache miss is incurred. While this implementation works surprisingly well for many indexesxe2x80x94trading off the occasional full compare for better fanoutxe2x80x94there is room for improvement, especially if the schema is not carefully designed. Further, even with this type of implementation, index size can still be an issue in resource-constrained environments, such as a typical Windows CE environment.
What is required is a solution which enables a database management system to maintain more compact indexes while providing performance equivalent to existing indexing schemes. The present invention fulfills these and other needs.
The following definitions are offered for purposes of illustration, not limitation, in order to assist with understanding the discussion that follows.
B-Tree
One of the most widely used indexing schemes in database systems is the B-Tree index scheme (including implementation variants such as a B+-Tree) in which the keys or keywords are kept in a balanced tree structure and the lowest level (i.e., leaf nodes) of the tree points at the data records, or, in some cases, contains the actual data records themselves. General techniques for the construction and operation of B-Trees are well documented in the technical, trade, and patent literature. For a general description, see Sedgewick, R., xe2x80x9cAlgorithms in Cxe2x80x94Third Edition,xe2x80x9d Addison-Wesley, 1998. For a survey of various B-Tree implementations, see Comer, D., xe2x80x9cThe Ubiquitous B-Tree,xe2x80x9d Computing Surveys, Vol. 11, No. 2, June 1979, pp. 121-137. Unless otherwise specified, in this document the term B-Tree includes B-Trees, B+-Trees, and the like.
Index
In a database system, an index is a list of keys or keywords which enable a unique record to be identified. Indexes are typically used to enable specific records to be located more rapidly as well as to enable records to be more easily sorted (e.g., sorted by the index field used to identify each record).
OLTP or Transaction Processing
A transaction processing or OLTP system is a type of computer processing system in which the system responds immediately to user requests. Each user request is considered to be a transaction. Automatic teller machines for banks are one example of a transaction processing system. Transaction processing involves interaction with a user, whereas batch processing can take place without a user being present.
Patricia Tree
A Patricia tree or path-compressed binary trie is a search tree in which each non-leaf node includes a bit offset and has two children. A Patricia tree is based upon a trie structure with each node including the index of the bit to be tested to decide which path to take out of that node. The typical way that a Patricia tree is searched is to start at the root node and the tree is traversed according to the bits of the search key. For example, if a given bit offset is xe2x80x980xe2x80x99, then the search proceeds to the left child of the current node. If the bit offset is xe2x80x981xe2x80x99, then the search proceeds to the right child of the current node. For further information on Patricia trees, see Morrison, D., xe2x80x9cPATRICIAxe2x80x94Practical Algorithm to Retrieve Information Coded in Alphanumeric,xe2x80x9d Journal of the ACM, 15(4): 514-534, 1968, the disclosure of which is hereby incorporated by reference.
SQL
SQL stands for Structured Query Language, which has become the standard for relational database access, see e.g., xe2x80x9cInformation Technologyxe2x80x94Database languagesxe2x80x94SQL,xe2x80x9d published by the American National Standards Institute as American National Standard ANSI/ISO/IEC 9075: 1992, the disclosure of which is hereby incorporated by reference. For additional information regarding SQL in database systems, see e.g., Date, C., xe2x80x9cAn Introduction to Database Systems, Volume I and II,xe2x80x9d Addison Wesley, 1990, the disclosure of which is hereby incorporated by reference.
Tree
A tree is a non-empty collection of nodes (or vertices) and edges that satisfies certain requirements. A node (also referred to as a vertex) is a simple object that can have a name and can carry other associated information; an edge is a connection between two nodes. The defining property of a tree is that there is exactly one path connecting two nodes. A rooted tree is one in which one node is designated as the root of the tree. In a rooted tree, any node is the root of a subtree consisting of it and the nodes below it. Each node (except the root) has exactly one node above it, which is called its parent. The nodes directly below a given node are called its children. A leaf node is a node that has no children (or for which all the subtrees are empty). The nodes in the tree that are not leaves are referred to as internal nodes. For further information on trees, see e.g., Sedgewick, R., xe2x80x9cAlgorithms in Cxe2x80x94Third Editionxe2x80x9d, Addison Wesley, 1998, the disclosure of which is hereby incorporated by reference. In the context of a database system, the leaf node of the tree is typically associated with a data record, which enables access to data records from leaf nodes. By way of example, a data record may be accessed directly (i.e., through a pointer) from the leaf node. The leaf node may also point to a data structure (e.g., a table) which, in turn, enables access to data records. The leaf node may also contain the data record itself. Other variants are, of course, also feasible.
Trie
A trie is a binary tree that has keys associated with each of its leaves defined recursively as follows: the trie for an empty set of keys is a null link; the trie for a single key is a leaf containing that key; and the trie for a set of keys of cardinality greater than one is an internal node with a left link referring to the trie for the keys whose initial bit is 0 and right link referring to the trie for the keys whose initial bit is 1, with the leading bit considered to be removed for purposes of constructing the subtrees. For a description of tries, see e.g., Sedgewick, R., xe2x80x9cAlgorithms in Cxe2x80x94Third Editionxe2x80x9d, Addison-Wesley, 1998, the disclosure of which is hereby incorporated by reference. Searching a trie typically proceeds from the root to a leaf, where the edge taken at each node depends on the value of an attribute in the query. Typical trie implementations have the advantage of being fast, but the disadvantage of achieving that speed at great expense in storage space. For further information regarding trie structures, see e.g., Comer, D., xe2x80x9cHeuristics for Trie Index Minimization,xe2x80x9d ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979, pages 383-395, the disclosure of which is hereby incorporated by reference.
An improved method for creating an index based on a path-compressed binary trie in a database system comprising database tables and indexes on those tables is described. For a given index to be created, a path-compressed binary trie for the given index is determined. The path-compressed binary trie comprises internal nodes and leaf nodes. Based on a traversal of the path-compressed binary trie, an index is created comprising a first array of internal nodes encountered during the traversal, and a second array of leaf nodes encountered during the traversal. The database system employs said first and second arrays for providing index-based access for a given key value.