1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to establishing a point of consistency (i.e., a checkpoint) in a load operation in a parallel database loading system, from which point the load operation may be restarted in case of, for example, failure of the operation.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) that uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent storage.
A table can be divided into partitions, with each partition containing a portion of the table""s data. Each partition may reside on a different data storage device. By partitioning tables, the speed and efficiency of data access can be improved. For example, partitions containing more frequently used data can be placed on faster data storage devices, and parallel processing of data can be improved by spreading partitions over different DASD volumes, with each I/O stream on a separate channel path. Partitioning also promotes high data availability, enabling application and utility activities to progress in parallel on different partitions of data.
Some systems have very large databases, storing data on the order of terrabytes of information. With the growing use of computers and the increased types of data that is stored on a storage device (e.g., images and audio, as well as large amounts of text), such large databases are becoming more and more common. Loading that amount of data from an input source into a database management system (DBMS) can take many hours. Traditionally, database loading systems (also referred to as xe2x80x9cload utilitiesxe2x80x9d) periodically checkpoint a status during the loading process. A checkpoint is a point in a process at which time all input/output (I/O) activity is halted and state information is stored. In particular, the state information includes a location in an input file at which loading of data is to be restarted, a location at a tablespace in which data is to be written upon restart, and error information. If any error occurs before the loading is complete, the load utility can be restarted at the last checkpoint, rather than at the beginning of the input file. Since the load utility does not have to start processing from the beginning of the input file, a great deal of time is saved.
In an attempt to speed up the loading of data, various approaches have been tried involving the use of parallel processing. Parallel processing exploits the multiprocessor capabilities of modern high speed computers and refers to the use of several processors to load data into different parts of the database in parallel with each other. That is, data is loaded into different partitions of a database by load utilities that are executing concurrently. In particular, the data to be loaded into the database may be separated into multiple input files. Then, a load utility may load data into a tablespace (i.e., read data from an input file and store the data in a tablespace).
However, loading the data in parallel greatly complicates the ability to do checkpoints and restart the load after a failure. With multiple processors reading input data from different input sources and loading the data into different parts of a database, it is difficult to establish a checkpoint that enables a consistent point of restart for the multiple processors. In particular, this requires coordination between all of the processes performing the load. Conventional load utilities that load data in parallel often require that data be reloaded starting from the beginning of a partition, rather than at a checkpoint.
Therefore, there is a need in the art for an improved method of establishing a checkpoint during a load operation in a parallel database loading system.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for a computer implemented technique for establishing a checkpoint during a load operation in a parallel database loading system.
In accordance with the present invention, under control of a main process, multiple agent load processes are started for loading data in parallel. The main process awaits receipt of a checkpoint signal from each agent load process. Then, upon receiving the checkpoint signal from each load process, the main process performs a checkpoint for all agent load processes.