1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to a technique for optimizing a number of tasks to be executed in a parallel database loading system.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) that uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent storage.
A table can be divided into partitions, with each partition containing a portion of the table's data. Each partition may reside on a different data storage device. By partitioning tables, the speed and efficiency of data access can be improved. For example, partitions containing more frequently used data can be placed on faster data storage devices, and parallel processing of data can be improved by spreading partitions over different DASD volumes, with each I/O stream on a separate channel path. Partitioning also promotes high data availability, enabling application and utility activities to progress in parallel on different partitions of data.
In an attempt to speed up the loading of data, various approaches have been tried involving the use of parallel processing. Parallel processing exploits the multiprocessor capabilities of modern high speed computers and refers to the use of several processors to load data into different parts of the database in parallel with each other. That is, data is loaded into different partitions of a database by load utilities that are executing concurrently. In particular, the data to be loaded into the database is sorted and then separated into multiple input files. Then, a load utility may load data into a tablespace (i.e., read data from an input file and store the data in a tablespace).
However, conventional techniques for loading databases in parallel do not fully utilize the resources of the hosting parallel database loading system. Because parallel load processes only operate on a single partition and a single input file at a time, the processing capabilities of the particular host system need to be taken into account in order to optimize the speed at which databases are loaded. Past loading techniques fail to optimize the tasks required to load a database, and as such, unnecessarily slow processing times result.
Therefore, there is a need in the art for an improved technique for optimizing a number of tasks to be invoked simultaneously in a parallel database loading system.