There are two database models currently in common use: the relational model and the object-oriented model. Both models organize the database into various database records.
In the relational model each record belongs to a table of a type specified by the database designer. There could be a customer table or an invoice table, for example. Every record in a particular table has the same structure, i.e., each record has the same fields. These fields are considered to be simple or atomic (i.e., fixed as integer, character, or date type fields). Fields cannot be, for example, a list of dates, or a set of integers. Tables of the relational model have a flat structure.
In the object-oriented database model a database record belongs to a database class, as opposed to a table in the relational model. A record is an "instance" of a class. The database records can be complex (i.e., the record is not limited to flat structures). A record attribute (i.e., field) can be a list, a set, an embedded structure, a list of structures, etc. An example of a list of structures would be a list of previous addresses where the address field itself has a structure such as: street address, city, zip code. In the object-oriented model records can vary in size. It would be possible to have a list of indefinite size, for example. Different records can have a variety of numbers of previous addresses, in the previous address list case.
The object-oriented model also adds the concept of subclassing. Here classes can be subclasses of other classes. A doctor class could have a pediatrician subclass and a podiatrist subclass. When a database record is from a class that is a subclass, it inherits the attributes (fields) of the super-class. The size of a record from the pediatrician subclass will generally be longer than a record from the doctor class.
To satisfy requests for information from a database, data from more than one database record often has to be combined. The object-oriented and relational models do this in different ways.
In the object-oriented model data is combined from two or more database records by "pointer navigation" (also termed "pointer chasing"). FIG. 1 is a representation of the pointer navigation method. FIG. 1 illustrates a customer object 10, an order object 12, and an invoice object 14. There is an explicit pointer in one database record that points to another database record if there is a relationship between the two records. In FIG. 1, a pointer 16 defines a relationship between the customer object 10 and order object 12, and a pointer 18 defines a relationship between the order object 12 and the invoice object 14. The database management system, the software controlling the database, maintains the pointers 16 and 18 and uses them to retrieve related data from the two records.
Pointer chasing methods allows for fast retrieval of data. However, the pointer relationship must exist between two particular records for such retrieval to be possible. If a pointer relationship has not been previously established and maintained, no retrieval is possible. Many of the relationships between class instances are generally known at the time of the design of the database. In these cases the fast retrieval afforded by pointer chasing can be used. However, the pointer chasing method can not be used for ad hoc queries where the relationship of interest is an unexpected relationship. In general, object-oriented databases have generally weak query facilities and do not fully support industry standard Structured Query Language (SQL).
Relational databases have greater flexibility with ad hoc queries. When relational databases combine data from the records of two tables they "join" the two tables. The result of such a join is a result table that can then be joined to another table, producing another result table, etc. Each relational table is independent in the sense that there are no explicit pointers from a record in one table to a record in another table. FIG. 2 illustrates the same entities 10, 12, and 14 discussed in conjunction with FIG. 1, however, no explicit pointer definitions between the entities 10-14 are defined.
In contrast, a relational join operates by specifying a condition between the values of the fields of one table against the values of the fields of another table. An "equi-join" for example, could join records of a customer table with those of an order table by joining on customer number. Each order record would have the customer number and the customer number would also be in a customer record. The standard SQL query language would specify: EQU SELECT * FROM customer, order WHERE
customer.customer.sub.-- number=order.customer.sub.-- number;
where * means return all fields from all tables listed after the FROM command.
There are three basic processes used to join data in a relational database. The most common process is the nested loop operation where one table is used as the outer loop while the second table is used as the inner loop. A record is taken from the outer loop table and its values are checked against the values of each record of the inner loop table in succession. When a comparison passes the specified conditions, a result table record is written. After a full inner loop has occurred then the join operation moves to the next record in the outer loop. Indexes can improve the performance of the nested loop join. Rather than read though all the records of the inner loop, the comparison can look in the index for records that satisfy the join condition with the outer loop record.
The second process of joining data uses a hash join. This is generally used for relatively small tables where all the records can be cached in memory by hashing to a specific location in memory based on a record field.
The third process sorts and merges the two tables. This method is very inefficient for large tables.
The processes described above that are used for the relational join take advantage of the flat tabular structure of the relational model. With every record in a table having the same structure, each field is at the same displacement to quickly check field values.
In summary, the object-oriented database model is powerful in providing the ability to build complex fields to model real data more closely, and to improve programming efficiency and flexibility, however the query capabilities are generally limited to pointer chasing thereby severely limiting ad hoc query capabilities. The relational database model has very strong ad hoc querying capabilities, but is limited in terms of flexibility by the inherent field restrictions.
Recently extended-relational databases (also called object-relational, e.g. Informix.TM./Illustra.TM.) have been developed to try to overcome the shortcomings of the relational model. These databases allow some complex data types but use traditional techniques for join processing. Likewise some object-oriented databases have been developed to decompose the complex objects in their databases into flat relational tables so that ad hoc relational queries can be processed. However, the decomposition of complex objects increases processing loads thereby limiting the advantages of the ad hoc query facility.
Message-based joins for object-oriented databases have been discussed by Tanaka, K. et al. in "On Natural Joins in Object-Oriented Databases", Deductive and Object-oriented Databases, Elsevier Science Publishers, 1990, incorporated herein by reference. A natural join is a special case of a join--where tuples have one or more fields with the same names and values and the results are reduced so that there is only tuple per two joined tuples. Tanaka et al. do not provide solutions for efficient processing for object composition operations including those not limited to natural joins in databases. Specifically, Tanaka et al. indicate in their paper that further research will be needed to develop efficient processing techniques for object composition operations including the natural join discussed in the paper.
There is a need for a method of joining complex objects efficiently in a database management system.