1. Field of the Invention
This invention relates to database processing.
2. Related Art
Database processing is a widely known and commercially significant aspect of the data processing field. A database is a collection of data organized into records. Computer operations which are performed on and/or using these data records are referred to as database processing. A database manager is the package of database processing operations used to access and manipulate the database.
Database processing commonly involves operations such as sorting, merging, searching and joining. Set operations such as union, intersection and difference are also common aspects of database processing. Conventionally, these operations have been performed by software algorithms requiring significant amounts of processor execution time. In large databases, the processor execution time can be prohibitive, significantly degrading the performance of software database managers.
One aspect of database processing involves searching for records that meet a certain criteria. For example, a search might scan all records in the database and return every record that has column "X" equal to some value. In a more complex case, the search might scan and retrieve records based on a predicate equation involving many columns and values. Since the databases can be very large, (e.g. hundreds of millions of records), the searching algorithms used in conventional software database management systems can take hours to process.
Another aspect of database processing involves the extraction of sort keys and search fields from the database records. In sorting and searching applications, the sort key or field on which a search is defined are often spread throughout the records and must be extracted into a useable format before the operation can proceed. For sorting this means getting each key field and concatenating them in precedence order so that they can be treated as one large key. For searching, the columns of a table that are being searched or selected must be extracted from the rows. Conventionally, these and many other database operations can use significant amounts of CPU time.
There have been a number of attempts to provide solutions which "off-load" the CPU of some of the database processing tasks. One such solution was to provide a computer system with a specialized sort processor to handle sorting tasks which would otherwise be executed by the CPU. FIG. 1 is an illustration of a such a prior art sort processor.
The sort processor 100 of FIG. 1 shares a main memory bus 102 with a central processing unit 104. A main memory 106 communicates with the central processing unit 104 through a first set of communications lines 108 and with the sort processor 100 through a second set of communications lines 110. The central processing unit 104 and the sort processor 100 are interconnected by a start control line 112 and interrupt line 114.
During database processing, random records stored in a peripheral file 116 are transferred to the main memory 106. A control program then initiates the action of the sort processor 100. Under firmware control, the key words are transferred from the main memory 106 to the sort processor 100 and then sorted. The autonomous action of the sort processor 100 results in savings of Central Processing Unit time and simplification and reduction of programming efforts.
While the sort processor 100 of FIG. 1 provided an advance over contemporaneous software database processors, the solution was only partial. Sorting is but one aspect of database processing. Other operations such as searching, joining of tables and set operations also take up significant amounts of CPU time. Further, although the sort processor of FIG. 1 operated relatively autonomously from the remainder of the system, its internal sorting algorithm was implemented in microcode. Thus the sort processor of FIG. 1 was limited by many of the same constraints as sort programs which would otherwise be executed by the CPU.
The sort processor 100 of FIG. 1 is also subject to performance limitations in that it shares the main memory 106 with the central processor 104 on a low priority basis and along a single memory bus 102. The competition for main memory access time along the single bus path (albeit reduced relative to systems having no sort processor) can cause a bottleneck for data traffic between the main memory 106 and the sort processor 100.
Another approach to database processing off-loads some of the database processing tasks traditionally handled by the CPU to a vector processing element. FIG. 2 is an illustration of one such prior art relational data base managing system utilizing a vector processor. A central processor 200 includes a scalar processor 202 and a vector processor 204. Both the vector and scalar processors have access to a main memory 206 and a subsidiary storage 208.
In operation, a database command issued from an application program 210 is examined by a relational database managing program 212. The database managing program 212 identifies the command, analyzes it, and then generates codes which designate a determined process sequence. During database processing, required data is loaded into a data buffer area 214 in the main memory 206 from a page area in the subsidiary storage 208. Under control of the process sequence codes, the data in the buffer area 214 is rearranged in the form of a vector structure. The resultant vector data elements are stored in a vector area 216.
When the process sequence codes indicate the processing of the vector data, a sequence of vector instructions are executed by the vector processor 204. Under control of the vector instructions, the vector processor 204 processes the data in the vector area 216 in accordance with the designated processes sequence.
One constraint of the database managing system of FIG. 2 is that as a consequence of its reliance on vector processing, the database must be generated in or converted to a vector format prior to being processed. This conversion process can, in itself, take a significant amount of time and can slow the data base processing operation.
A second constraint of the database managing system of FIG. 2 is that the associated vector processing hardware is programmed at the instruction level (i.e. it is left to the programmer to write the instruction primitives required to perform the database processing functions). These instructions are executed by the vector processing hardware synchronously within the database management program. Thus, the database manager program is dedicated to a single task while the instructions are being executed. Further, as each primitive instruction is executed it requires CPU time. The CPU is, therefore, not completely free of database processing duties until the entire series of primitives is executed.
Yet a third constraint of the database managing system of FIG. 2 is that the vector processing approach adds levels of indirection to the database processing operation. In vector formatted databases, a series of pointers are used to locate the actual data of interest. A location in one table will often point to another location in another table and so on. This indirection can increase processing time of database operations.