The "join" operation used in database systems is the fundamental operation that allows information from different data tables to be combined in a selected way. Tables consist of collections of data grouped by a common subject matter, such as names or ages. The join operation allows a user to combine selected groups of data from multiple input data tables according to a specified condition between the records in each table to be combined. For example, a user may wants to join two tables, the first containing names and telephone numbers, and the second containing names and addresses, to produce a result which contains all the records that include common names between the two tables with both their telephone numbers and addresses. The join operation is used during a "query", or data request operation, by the user of a database system. Join operations are typically expensive in terms of processing time making efficiency a critical component to performance of the database system.
In some cases, tables of data will be very small relative to the amount of main memory (also called random access memory or "RAM") of the computer in the data base system. An example of a relatively small table is one that contains data of 30 students names and their quarterly grades. Tables are normally organized by columns of related data. In this case, all the names would appear in one column entitled "Names". In this example of 30 records, the entire table can be read into a typical main memory of a computer in one operation and can be completely processed while contained in RAM.
Other applications have input data tables that are significantly larger than the storage capacity of available main memory of the processor. These applications need to process huge amounts of information very quickly to keep up with vast amounts of input data. Examples of these applications are NASA's Earth Observing System, with an estimated 1 terabyte of data to be processed per day, and data mining applications which contain massive amounts of transaction information. As technology advances, more and more tables of large amounts of data will be needed to be processed in a database platform. Even considering the typical amounts of available RAM storage in large main frame computers, the size of their main memory is significantly smaller than the size of the vast input tables of data described above. In order to process such large amounts of data using a database system, efficient techniques for joining large relations are needed when the processor utilizes a smaller main memory.
Join results from joining input tables can be computed in a number of different ways. The term "ad-hoc join" is used to describe the process of taking two input tables in a database and forming the join result by processing the entire standard representations of each table, without the benefit of any pre-computed special data structures such as indices. When the pre-computed special data structures are present, the join operation will perform more efficiently.
One such pre-computed access structure is called a join index. The join index was introduced by Valduriez in "Join Indices", ACM Transactions on Database Systems, 12(2):218-246, 1987. A join index between two input tables maintains pairs of identifiers for records that would match if a particular join operation is performed. For example, one entry in a join index might be (1,3) which indicates that the first record in the first table will be joined with the third record in the second table when a particular join operation is performed. The join index may be maintained by the database system, and updated when records are inserted or deleted in the underlying tables. In situations where joins take place often, the processing cost of maintaining a join index will be small compared to the savings achieved in performing the join operations using the join index.
Valduriez proposes a technique to perform join operations using a join index in his article. The described method for performing the join operation requires a large amount of main memory (see Valduriez article, page 223). The four steps of the Valduriez method for performing the join operation using a join index on two input tables designated A and B stored in a database are: (1) read in a portion of the join index (previously constructed) and selected matches between the join index entries and the records in A which will fit into available main memory; (2) internally sort the records from A and the join index entries by their table B identifiers; (3) select matches between the join index entries and the records in B outputting the resulting record; and (4) read in the next portion of the join index and the selected records of A. Repeat steps 2, 3 and 4 until all index entries in the join index are processed.
The Valduriez method has significant drawbacks when the input tables are large and the processor uses a relatively small main memory. The primary detriment is that there is a large amount of repetitious Input/Output ("I/O") cycles which must be performed during the join operation. Blocks of records in input Table B are often accessed numerous times when records located in the same disk block are indicated throughout the join index for Table B. The access repetition is caused by only processing a portion of the join index at one time. The Valduriez method is also only described for a maximum of two input tables being joined at one time.
An important consideration in measuring the efficiency of a join operation is the number and kind of disk accesses performed during that operation. The "cost" (processing time) of I/O operations becomes increasingly important when the tables to be joined are very large. A typical access of one block (typically 8000 bytes) of data stored on a disk using a conventional Fujitsu M2266 one gigabyte drive is about two milliseconds. The cost of rotational latency (moving the disk in a circular motion to the specified address) for that drive is about eight milliseconds. The seek time (moving the disk head up and down to the proper location on the disk) is about 16 milliseconds. Other drives have proportionate access times. While these costs are seemingly small when joining tables small in size, the costs increase drastically when the input tables are very large on the order described above. It then becomes increasingly important to minimize overall I/O operations to reduce the number of overall seek/rotational latencies caused by I/O operations and correspondingly reduce the operational cost ratio of seek/rotational latencies per block of data transferred.