This invention relates to a method for performing joins in a database system.
In general, a database comprises a number of records, each of which is of one of a number of record types, each record comprising a number of fields. The database may also contain one or more indexes, allowing records to be accessed on the value of a particular field, referred to as a key. Alternatively, the database may have a hashed access mechanism for accessing records on their keys.
Furthermore, certain database systems, referred to as hierarchic, have the concept of ownership of one record type by another. A record can own a group or set of records of a different type, and is linked to those records by a chain of pointers embedded in the records. Sometimes a local index takes the place of the pointers, but the principle is the same.
A database may be accessed by means of a query language, such as the well-known SQL (Structured Query Language). As an example, consider the following SQL query:
SELECT CUSTOMER.sub.-- NAME, PRODUCT.sub.-- NAME, QUANTITY PA1 FROM CUSTOMER, ORDER PA1 WHERE CUSTOMER.ORDER.sub.-- NO=ORDER.ORDER.sub.-- NO; PA1 (a) processing the query to create a data structure comprising a plurality of nodes, linked together to form a plurality of chains, each of said nodes representing one of said data types, said processing comprising: PA1 (b) constructing a virtual row of said join by accessing said nodes in sequence and, for each of said nodes, reading a record of the record type represented by that node; and PA1 (c) testing said virtual row to determine whether it satisfies said database query.
This assumes that the database contains two record types CUSTOMER and ORDER, the CUSTOMER record type containing fields ORDER.sub.-- NO, CUSTOMER.sub.-- NAME and ORDER.sub.-- NO, and the ORDER record type containing fields ORDER.sub.-- NO, PRODUCT.sub.-- NAME and QUANTITY. The WHERE clause specifies that the ORDER.sub.-- NO in the CUSTOMER record is to be equal to the ORDER.sub.-- NO in the ORDER record. Execution of this example SQL query would return to the user the data items named in the SELECT clause from all pairs of CUSTOMER and ORDER records which satisfy the WHERE clause.
Because the query refers to more than one record type, the records are said to be joined. Each combination of records that satisfies the WHERE clause condition is referred to as a virtual row of the join of the two record types.
There are many ways of organising a join to produce this result. In a very primitive database system with no indexes or keys, it would be necessary to pair together all possible combinations of CUSTOMER and ORDER records, and to apply the WHERE clause to each combination to decide whether or not to return results to the user. This is referred to as forming the Cartesian product, and is very inefficient because it involves, in principle, n*m record accesses, where n is the number of records of one type and m is the number of records of the other type. For more than two record types, the number of accesses can be astronomical, unless the database is small.
If the customer record has an index of which CUSTOMER.ORDER.sub.-- NO is the key, the situation is somewhat better. All the ORDER records must be read, but because they contain the key of the corresponding CUSTOMER record, it is possible to go straight to that record and avoid reading the records of customers which have no orders. In principle the number of records accessed will be 2*n, where n is the number of orders, though the actual number would vary according to the numbers of records present and the characteristics of the database. Hashed access behaves in a similar way to indexed access; these two access methods are common in relational database systems.
In a hierarchic system, customers and orders might be linked by a set, CUSTOMER being the owner and ORDER the member. The most efficient way of accessing these records would then be to access all the CUSTOMER records in turn, and follow the set pointers to the ORDER records which belong to it. The number of accesses would in principle be m*the average number of records in the set (where m is the number of customer records). This can be very efficient, especially if advantage is taken of a facility to locate member records in the same "page" or disc block as the owner record, in which case one disc access may retrieve the whole set.
The object of the invention is to provide an improved method of processing a search query involving a join, which exploits these possibilities for improving the efficiency of the join.