1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to determining the optimal number of tasks for building a database index in a (virtual) memory constrained environment.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD) such as magnetic or optical disk drives for semi-permanent storage.
A table can be divided into partitions, with each partition containing a portion of the table's data. By partitioning tables, the speed and efficiency of data access can be improved. For example, partitions containing more frequently used data can be placed on faster data storage devices, and parallel processing of data can be improved by spreading partitions over different DASD volumes, with each I/O stream on a separate channel path. Partitioning also promotes high data availability, enabling application and utility activities to progress in parallel on different partitions of data.
An index is an ordered set of references to the records or rows in a database file or table. The index is used to access each record in the file using a key (i.e., one of the fields of the record or attributes of the row). However, building an index for a large file can take a considerable amount of elapsed time. The process involves extracting a key value and record identifier (rid) value from each of the records, sorting all of the key/rid values, and then building the index from the sorted key/rid values. Typically, the extracting, sorting, and index build processes are performed serially, which can be time consuming in the case of a large database file. Additionally, even if some of the tasks are performed in parallel, due to memory constraints, there could be inefficiencies in the processes.
When data is loaded or reorganized, indexes are built that provide access to the data. Building these indexes, however, can be very time consuming. Additionally, when computer systems fail, indexes could be corrupted or destroyed, and recovery of the indexes, which involves rebuilding each index, can be very time consuming. Therefore, there is a need in the art for techniques that build indexes more efficiently.