Object oriented software development within computer systems, while no longer new, is still growing. Techniques and methodologies have not yet become standardized and object oriented databases are lagging behind object oriented software development in other areas. Direct storage of objects on nonvolatile mass storage is not widely available.
Relational database technology is mature and widely available, with a choice of vendors available for almost any development platform. Languages and access techniques are standardized and database structure optimization is well understood. Additionally, new applications may have to integrate with legacy data already stored in existing relational databases.
This combination of circumstances leads a large number of object oriented software developers to rely on conventional relational databases for data storage. Using conventional relational databases within an object oriented software environment, however, has its own drawbacks. First among these is that the structure of the object oriented environment and the structure of the relational database environment are different.
Conceptually, an object is an encapsulated set of data fields along with the processing functions that operate on data contained within the fields. Objects are organized into classes where all objects of a class share a common pattern of data fields and processing functions. Each individual object of a class has its own identity that differs from other objects of the same class, and typically has unique values stored in its data fields. Logical relationships between objects are often implemented by using pointers to provide direct access from one object to another object.
Relational databases use a flat, tabular format to store data. Data is partitioned into tables and then into columns within the tables. A particular set of data is stored as a row within a table, or the set may be split into two or more rows, with each row stored in a different table.
Within a relational database, no provision is made for storing any of the processing functions that operate on the data stored within the relational database. Relationships between the data tables are implemented by using corresponding columns, known as key fields, in the separate tables. A matching value within corresponding columns of two tables indicates related row entries between the tables.
Because of this difference in structure between the object oriented environment and the relational database environment, an object cannot be directly stored into a relational database. Instead, some mapping technique must be applied to convert between objects and database tables. Some of these techniques are well known, and some have been automated. One technique is to use a table for a class of objects, define a column in the table that corresponds to each data field in the class, and store the values for each individual object as a row of the table. Pointer references between objects are converted into key field values for storage in the database. The functions for the objects are stored separately, usually as a part of the software program which is executed to perform the processing. Existing mapping techniques either result in a poor relational database model, a poor object model, or limited use of the capability of the relational database management system to retrieve precisely the data needed with a single query.
Database access from within the object oriented environment is typically implemented by either embedding query language statements within the functions of the class of objects, or by utilizing library routines, called from the functions, to retrieve or store data. The embedded approach requires that the programmer know both the database query language and the development language for the objects, while the use of library routines often limits the query capability to a subset implemented by the library.
A difficulty arises when an object is retrieved from the database, and the retrieved object refers to a second object. If the second object has been previously retrieved from the database, so that it already exists within the object oriented environment, a pointer reference to the second object can be obtained and stored in the retrieved object. However, if the second object has not been retrieved from the database, so that the second object does not yet exist within the object oriented environment, a pointer reference cannot be used. Since the reference from the retrieved object to the second object cannot be resolved, the second object pointer must be marked as unusable in the retrieved object, until such time as the second object is retrieved. Once the second object is retrieved, the system must locate all of the previously retrieved objects that have unusable pointers to the second object, and update their pointer references, so that the now retrieved second object is accessible through the pointer references. This updating process can take considerable processing time, and programming for this situation is error prone.
The concept of multiple logical views of data is well known in the database field and this concept has been extended to retrieval of objects from relational databases. In implementing logical views, a subset of the data from a database, that corresponds to an object, is retrieved and defined as an object in the object oriented environment. Other views of the database would utilize a different subset of the data, likely with overlapping contents. Typically, when multiple views of the same database are retrieved, they are stored separately in memory. This results in duplicate storage of the overlapping data values and creates a coherency problem when one of the copies of the overlapping data is modified. A performance penalty is also incurred, because duplicated data may be retrieved from the database more than once, since it is not usually possible to retrieve only the non-duplicated data when a new view is needed. This approach also violates the concept of object identity where each object has its own identity, even if it has the same data values as another object, and all references to that object point to a single copy.
Caching techniques are also well known in the computer industry. The use of a cache reduces the processing time spent retrieving data from storage devices such as disk drives. When a set of data is retrieved, it is placed on the cache, in memory. A later request for the same set of data is satisfied by using the cache copy rather than again retrieving the set of data a second time from disk. Techniques for maintaining the coherency of the cache copy and the disk copy when one is modified, are also well known.
Traditional caching techniques typically utilize a single, monolithic cache associated with one or more storage devices. All requests for data for a device are processed by the same cache without regard to what program submitted the request. While efficient from the aspect of processing time, this approach is undesirable in terms of software design. To efficiently utilize a cache, a software program must have the ability to determine which data should be cached and which should not. It must also be able to flush certain data from the cache when the data is no longer needed. With a traditional monolithic cache, this requires that the program interact with an entity, the cache, that is outside of the program, thus forming a coupling between each program that uses a storage device and the cache software for the device cache. This coupling to an external entity makes the program dependent on a specific system configuration, reducing its flexibility, and it also restricts the reuse of the program across multiple computer systems.
There is a need in the art for a method of retrieving data from a relational database into an object oriented environment that maintains object identity, while eliminating the problems of duplicate storage and data coherency. There is also a need for such a system that can correctly resolve references to later loaded objects without the need to update pointers in preexisting objects. There is a further need for such a system to provide an in-memory cache without coupling the objects to an external entity. A still further need is for such a system that provides flexible access to the database without requiring that the object developer know the query language of the database.