1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to reducing very large tables to optimize the execution of a pluality of actions in a parallel processing database system.
2. Description of Related Art
Relational DataBase Management Systems (RDBMS) store data into tables. A table in a relational database is two dimensional, comprising rows and columns. Each column has a name, typically describing the type of data held in that column. As new data is added, more rows are inserted into the table. As data is changed, rows are updated. A user query selects some rows of the table by specifying clauses that qualify the rows to be retrieved based on the values in one or more of the columns. These changes and queries are referred to as actions against the table.
With the advent of data warehouses, it is not uncommon for relational databases to store very large tables. Such tables may range from megabytes to gigabytes, terabytes, or more. As a result, the RDBMS may have to examine thousands, millions, billions, or more, records to satisfy each action. In the prior art, the necessary records would be retrieved from the table once per action. Often, however, it may be possible to apply one or more actions to reduce the number of records examined before applying others of the actions. The advantage, of course, is that the table size and record counts for the subsequent actions could be greatly reduced. This would result in faster execution using fewer resources, thereby improving response time and data throughput.
While there have been various techniques developed for optimizing the performance of RDBMS, there is a need in the art for techniques that optimize the performance of user queries by reducing the size of very large tables.
The present invention discloses a method, apparatus, and article of manufacture for accessing a subject table in a computer system The subject table is partitioned across a plurality of processing units of the computer system. A user query or other request to access the subject table is split into a plurality of step messages, wherein each of the step messages is assigned to one of the processing units managing one or more of the partitions of the subject table. One or more actions are identified for each of the step messages, and one or more necessary records for these actions are retrieved from the partition of the subject table and stored into a corresponding partition of a spool table. The necessary records are selected in such a manner such that only one of the actions involved in the request need to access the partition of the subject table. The remaining actions are then performed against the partitions of the spool table rather than the partitions of the subject table. An optimizer function uses information from the spool table to generate more efficient execution plans for the step message and its associated actions.
An object of the present invention is to optimize the database access on parallel processing computer systems. Another object of the present invention is to improve the performance of database partitions managed by a parallel processing computer systems.