Relational database management systems store data in relational databases referred to generally as tables. A table is a collection of records (e.g., rows) with defined structure (e.g., columns). Reading from a table the particular records that match some selection criteria typically requires that all records in the table be read and compared against the selection criteria to identify and select the rows satisfying the selection criteria. In order to speed up the selection of data, relational database management systems typically provide a mechanism for indexing the data with one or more database indices. Database indices map a value to a single record or set of records. Typically, an index payload contains a reference or set of references (e.g., a record number or page number and offset). The indices are typically much smaller than the table itself and therefore allow for faster identification and selection of records that satisfy given selection criteria. Searching a database table with an appropriate index is nearly equivalent, in terms of processing time, to reading a subset of records from the table.
Database indices are often implemented with a data structure referred to as a B-tree. A B-tree is a tree data structure that keeps data sorted and allows simple operations such as searches, insertions, and deletions to be realized in logarithmic amortized time. When using a B-tree to represent a database index, each node of the B-tree is a uniformly-sized index page that stores a group of index items. To limit the number of secondary storage (e.g., disk or flash memory drive) access operations required for each single page access, relational database management systems employ a data cache of a fixed-size, usually managed with a least recently used (LRU) algorithm. The data cache is then used to cache index pages in memory, as opposed to secondary storage, resulting in faster more efficient operations.
However, problems arise when the size of the index exceeds the size of the data cache. For instance, when the index fits into the fixed-size data cache in memory, operations on the index execute quickly (on the order of magnitude 10-100 microseconds on current hardware). However, when more and more data are added to the index and the index itself exceeds the fixed-size of the data cache, in all but a few very special cases, the performance of an insertion operation starts degrading very quickly since access to the secondary storage is orders of magnitude slower than memory access. In extreme cases such as when random or pseudo-random keys are utilized, most accesses to the index pages will require loading pages from, and writing pages to, the underlying secondary storage (on the order of magnitude 1-10 milliseconds per page on current hardware). When using a B-tree, accessing an index page (e.g., B-tree node) requires following a path in the tree from the index root page to an index leaf page, which is typically somewhere in the range of three to six pages. For example, as illustrated in the example B-tree 10 of FIG. 1, accessing the index leaf page 12 storing the value “28” requires following a path from the index root page 14 to an internal (non-leaf) page 16, and ultimately to the index leaf page 12. Thus, the performance degradation reaches at least two orders of magnitude.
In certain applications that utilize databases, high volumes of data may be generated and stored in a database without the need for modifying the data once it is stored. For example, some radio frequency identification (RFID) readers or scanners generate near constant streams of data, which represent events that need to be stored and indexed in real time. This RFID-based event data will typically not need to be modified, and in many cases will not be read. However, for the sake of accountability, the data must be available and in certain unusual circumstances the data will be read. In some large RFID-based systems, there may be several thousands of events being processed and stored per second. In a simplified approach, each event can be stored as a record in a database. Consequently, with large RFID-based systems, a database can grow in size rapidly.
With an extremely large database table, for performance reasons, processing a search query without an index by reading each record in the table is not a viable option. However, with extremely large database tables, a traditional database index gives rise to certain performance issues as well. For instance, utilizing a simple database index results in a very high input/output (I/O) load for processing potentially tens of thousands of pages per second. Furthermore, the amount of time required to perform an insert operation on the B-tree index becomes excessively long. This is due to the requirement that the index tree sub-paths be locked when modifying and rebalancing the nodes of the B-tree. Each lock on a tree sub-path must be held for a comparatively long time due to waiting on the completion of I/O processing, which in turn slows down parallel or concurrent writing transactions, and also the RFID readers or scanners. Consequently, even with an I/O system that is theoretically capable of sustaining the heavy I/O load, due to locking issues, performance may still suffer.
One common approach to address the problems that arise with extremely large database tables and corresponding database indices is to perform temporal partitioning of the index data. For example, referring to FIG. 2, there is illustrated a relational database management system 20 with a data cache 22 in memory storing an index partition 24. When the index partition 24 in the data cache 22 reaches a predetermined size, the index partition 24 is copied to secondary storage 26 and a new index partition is generated in the data cache 22. Consequently, when data are added to the database and the corresponding database index, insert operations are processed with the index partition 24 in the data cache 22 in memory, while older index partitions 28 in secondary storage 26 do not require locks for reading. This approach has one significant drawback. Although the performance of insert operations can be guaranteed because of the limited size of the index partition 24 in the data cache 22, there is no upper bound on read times. Specifically, as the database and corresponding database index grow over time, the number of index partitions 28 in secondary storage 26 increases linearly with time. When processing a read operation with the index, all of the index partitions in secondary storage 26 must be searched in order to find particular data satisfying particular data selection criteria. This means, the time needed for index read operations also increases linearly. Assuming random access to the data in the index partitions 28, virtually all page accesses will result in secondary storage access.