The invention relates to the interrelationship of databases, particularly relational databases, and object-oriented systems. More particularly, the invention relates to relationships between objects in object-oriented systems and descriptions of objects storable in field-delimited database structures. Field-delimited databases can structure data into fields which have common attributes. For example, relational databases can structure data into tables, each with columns and rows (in "n" dimensions) forming tuples, upon which certain operations in set algebra can be performed very conveniently.
Object-oriented applications (i.e., application programs) organize data and routines together into encapsulated units referred to as objects. Object-oriented applications lead to modular software systems which have increased flexibility and are easy to alter and maintain.
The difference between a relational database management system (RDBMS) and an object-oriented application is that an object "knows" what operations can be performed on its data, whereas an RDBMS only has a set of generic operations which can be performed on its tuples. For example, a "snow tire" object knows that it is related to an "axle" object and inherits from a "tire" object. In contrast, a relational database represents this information in three separate data tables with no explicit representation of the relationships between the tables. The tire table in a relational database might have foreign key information referring to the axle table, but this representation of the relationship between tire and axle is implicit. It is up to the application developer (i.e., computer programmer) to know about these relationships, what they mean, and how to handle them.
A relational database is stateless. One database query has no connection to the next query and no memory of the previous query. Thus there is a desire to provide in the application explicit references between different database queries, by grouping these results in a unified data structure in the application. More particularly, there is a desire to manage a unified, cohesive data structure of object instances, a data structure that represents the results of multiple queries to a structured database. It is further desired that this structure represent the relationships between these objects such that these relationships can be followed without the need to query the database for this information. Still further, it is desired that this structure be managed in such a way that the data in the structure is at all times consistent with the corresponding information in the database.
In known systems, the developer of an application program that communicates with a structured database typically hand-codes routines which store the information retrieved from database queries in small data structures. These data structures typically have no connection to one another. For example, a developer retrieves invoice information from the database in a first query and then retrieves the line items for that invoice from the database in a second query, and stores the results of these two queries in a single data structure, such as an array, in the application program. This data structure has no relation to any other data structure built as the result of other database queries. In particular, if another query had previously been made for some of the same information, there would be two copies of this information in the application program, thus providing a potential for inconsistent versions of the data in the program and in the database. That is, the two copies of the information in the program could be inconsistent with one another and both copies could be inconsistent with the information in the database. There would be no explicit reference between the program's two copies of the information.
As another example, suppose that a database contains personnel records for a company, and in particular contains tables that represent company departments and other tables that represent company employees. In known systems, a developer issues a separate database query each time he or she wishes to follow the relationship between an employee and department. Suppose the developer issues a database query which retrieves the department that is located in San Mateo. The result of this query is stored in a data structure, such as an array, in the application program. Next, suppose the developer issues a database query to determine which employees work in the department that is located in San Mateo. Suppose further that this query returns two rows from the Employee table, "Jane Smith" and "Sue Horn." These rows are placed into a data structure, such as an array, in the application program. Next, suppose that at some later time the developer wishes to determine what department Jane Smith works in. Because there are no references between the department data structure and the employee data structure, there is no link between Jane Smith and the San Mateo department. The developer will need to issue a third, separate query to the database to once again retrieve the San Mateo department.
In known systems, it is common to have hundreds or even thousands of such data structures in an application, each such structure having several potential relationships with other structures. This can lead to hundreds or thousands of unnecessary database queries. It will be appreciated that a mechanism for managing such structures which provides efficient performance and ensures consistency of data between such structures and the corresponding data in the database is desirable.
It is by no means a straightforward task to group the results of disparate queries into a cohesive data structure in an application. Among the principal problems are avoiding duplication of data within such a structure, ensuring consistency between data in the structure and data in the database by using database locks, and resolving the data integrity (coherency) issues associated with losing database locks when a database transaction is committed (i.e., when data is changed in the database). These problems are sufficiently complex that they have not been solved in the prior art. In known systems, developers typically work with small, atomic units of data which they create and then delete within the same routine to minimize such consistency problems.
If these problems of duplication avoidance, consistency, and data integrity can be solved, the cohesive data structure can provide a powerful tool for improving application performance. In particular, certain requests issued by the application can be resolved immediately by reference to the cohesive structure without any need to query the database.
What is needed is an automated method and system to manage information retrieved from a structured database (such as a relational database or other field-delimited database) in a format suitable for use by an application program (such as an object-oriented application) that works with the structured database, in a manner that speeds performance and improves robustness of the application.
Systems are known for manual mapping between objects in knowledge bases and database management systems. One approach is to employ a static class library as an interface between an object-oriented system and a relational database. An example is METHOD FOR INTEGRATING A KNOWLEDGE-BASED SYSTEM WITH AN ARBITRARY RELATIONAL DATABASE SYSTEM, U.S. Pat. No. 4,930,071 issued May 29, 1990 and assigned to IntelliCorp, Inc. of Mountain View, Calif. In static-type systems, objects can be extended to handle concepts such as relationships and inheritance, but they must be manually extended if they are to model complex real world structures. This limits their usefulness to building relatively simple object models from existing data, such as those used in rapidly building prototype systems. It is believed that there are commercial systems which use the static-type class approach. Among the candidates include "ProKappa" from IntelliCorp, "DB.H++" from Rogue Wave of Corvallis, Ore., and possibly "Open ODB" from Hewlett Packard Company of Cupertino, Calif. and "UniSQL" from UniSQL of Austin, Tex.
In known relational databases, the technique of "page caching" can be used to speed performance by keeping certain frequently referenced pages in program memory rather than on a storage device. (A page is a unit of information; in Unix systems, for example, a page is typically 2048 bytes.) Page caching has several limitations. First, pages are cached without any understanding of their semantic content. Second, cached pages are independent and cannot refer to other cached pages. Often, a database query must be performed in order to discover that a needed page is already in the cache. Third, database rows often take up only a small portion of a page, so that the memory allocated to the page cache is used inefficiently.
In known object-oriented databases, the concept of "swizzling" can be used in conjunction with page caching. In object-oriented databases, object instances can point to one another through virtual memory pointers. In "swizzling," virtual memory pointers between object instances are converted into physical memory pointers between cached pages. This technique is used, for example, by the ObjectStore system from Object Design of Burlington, Mass. In ObjectStore, object instances are identified by object IDs, representing virtual memory addresses which are generated by the system; the developer has no flexibility in defining these object IDs. As each page is brought into program memory, these object IDs are converted into physical memory references. Swizzling as taught in known systems is not applicable to relational databases, because rows in a database are identified by arbitrarily defined primary key values rather than by system-defined virtual memory addresses.