The "join" operation used in database systems is the fundamental operation that allows information from different data tables to be combined in a selected way. Tables consist of collections of data grouped by a common subject matter, such as names or ages. The join operation allows a user to combine selected groups of data from multiple input data tables according to a specified condition between the records in each table to be combined. For example, a user may want to join two tables, the first containing names and telephone numbers, and the second containing names and addresses, to produce a result which contains all the records that include common names between the two tables with both their telephone numbers and addresses. The join operation is used during a "query", or data request operation, by the user of a database system. Join operations are typically expensive in terms of processing time making efficiency a critical component to performance of the database system.
In some cases, tables of data will be small relative to the amount of main memory (also called random access memory or "RAM") of the computer in the database system. An example of a relatively small table is one that contains data of 30 students names and their quarterly grades. Tables are normally organized by columns of related data. In this case, all the names would appear in one column entitled "Names". In this example of 30 records, the entire table can be read into and stored in a typical main memory of a computer at one time and can be completely processed while contained in RAM.
Other applications have input data tables that are significantly larger than the storage capacity of available main memory of the processor. These applications need to process huge amounts of information very quickly to keep up with vast amounts of input data. Examples of these applications are NASA's Earth Observing System, with an estimated 1 terabyte of data to be processed per day, and data mining applications which contain massive amounts of transaction information. Data warehousing for large organizations or companies requires storing vast amounts of sales and inventory information which must be accessed quickly and efficiently by decision support systems in order to determine profit levels, analyze costs or perform other information processing. As data collecting, data storage and processing technology advances, increasing numbers of tables containing large amounts of data will be required to be processed using join operations in a database platform. Even when considering the typical amounts of available RAM storage in large main frame computers, the size of their main memory is significantly smaller than the size of the vast input tables of data described above. Thus some intermediate results will have to be stored during processing on secondary storage devices such as disks. The input/output ("I/O") disk access is much slower than the speed of main memory. Therefore, minimizing disk I/O is critical to achieving good performance. As a result, in order to process such large amounts of data using a database system, efficient techniques for joining large relations are needed.
An important consideration in measuring the efficiency of a join operation is the number and kind of disk accesses performed during that operation. The "cost" (processing time required) of I/O operations becomes increasingly important when the tables to be joined are very large. A typical access of one block (typically 8000 bytes) of data stored on a disk using a conventional Fujitsu M2266 one gigabyte drive is about two milliseconds. The cost of rotational latency (moving the disk in a circular motion to the specified address) for that drive is about eight milliseconds. The seek time (moving the disk head up and down to the proper location on the disk) is about 16 milliseconds. Other drives have proportionate access times. While these costs are seemingly small when joining tables small in size, the costs increase drastically when the input tables are very large on the order described above. It then becomes increasingly important to minimize overall I/O costs by reducing the number of blocks transferred as well as the total number of seek and rotations involved in data transfer.
Join results from joining input tables can be computed in a number of different ways. The term "ad-hoc join" is used to describe the process of taking two input tables in a database and forming the join result by processing the entire standard representations of each table, without the benefit of any pre-computed special data structures such as indices. When the pre-computed special data structures are present, the join operation may perform more efficiently.
One such pre-computed access structure is called a join index. The join index was introduced by Valduriez in "Join Indices", ACM Transactions on Database Systems, 12(2):218-246, 1987. A join index between two input tables maintains pairs of identifiers for records that would match if a particular join operation is performed. For example, one entry in a join index might be (1,3) which indicates that the first record in the first table will be joined with the third record in the second table when a particular join operation is performed. The join index may be maintained by the database system, and updated when records are inserted or deleted in the underlying tables. In situations where joins take place often, the processing cost of maintaining a join index will be small compared to the savings achieved in performing the join operations using the join index.
Valduriez proposes a technique to perform join operations using a join index in his article. The described method for performing the join operation requires a large amount of repetitive disk I/O (see Valduriez article, page 223). The four steps of the Valduriez method for performing the join operation using a join index on two input tables designated A and B stored in a database are: (1) read in a portion of the join index (previously constructed) and selected matches between the join index entries and the records in A which will fit into available main memory; (2) internally sort the records from A and the join index entries by their table B identifiers; (3) select matches between the join index entries and the records in B outputting the resulting record; and (4) read in the next portion of the join index and the selected records of A. Repeat steps 2, 3 and 4 until all index entries in the join index are processed.
The Valduriez method has significant drawbacks when the input tables are large and the processor uses a relatively small main memory. The primary detriment is that there is a large amount of repetitious I/O operations which must be performed during the join operation. Blocks of records in input Table B are often accessed numerous times when records located in the same disk block are indicated throughout the join index for Table B. The access repetition is caused by only processing a portion of the join index at one time. The Valduriez method is also only described for a maximum of two input tables being joined at one time.
A technique called "jive-join" for joining multiple tables together using a join index is discussed by Ross in U.S. Pat. No. 5,666,525. The jive-join technique utilizes separate temporary files of record identifiers in order to allow separate processing for each of the input tables. Furthermore, vertical partitioning of the output files stores the output records for each input table separately. By separately processing each input table, the records from each table need only to be processed once thus reducing the disk accesses performed and rotational latencies. However, the jive-join technique requires that one input table be processed before the other tables in order to allocate the buffers. This requirement also necessitates that the join index be pre-sorted by the join index record identifiers for the first processed input table. It is more efficient in the jive-join technique to assign the largest input table as the first table which is processed. Additionally, if a column in a table is to be joined with another column in the same table (a "selfjoin"), the input table must be read separately for each column processed.
Another technique for joining multiple tables together using a join index is called "slam-join" and is discussed by Ross and Li in U.S. Pat. No. 5,802,357. The slam-join technique uses a parallel-merge operation to accomplish the join operation but requires that the last table in the join be processed separately and after the other tables. The slam-join technique also requires that an input table must be read separately for each column processed while performing a self-join operation. The join index must be pre-sorted by the record identifies for the first table to be joined. It is advantageous with slam-join technique to identify the largest input table and designate that table as the last table to be processed in the join operation.
It would be advantageous to have a join technique which would perform the join operation irrespective of the join index order, could process all the input tables simultaneously and could perform a self-join operation without reading the input table multiple times.