Many application developers are adopting the use of object-oriented programming languages and design techniques to develop software applications because of the advantages these languages offer over conventional programming languages. For example, object-oriented languages support the definition of complex objects, inheritance hierarchies, and behavioral properties of objects. For these same reasons, object-oriented representation is also considered more powerful than the relational data model used to define relational databases. However, many enterprises have their data stored in databases managed under relational database management systems, and object-oriented application programs often need to access this data to facilitate further processing.
Many new software applications have been, and are being, developed using object-oriented programming languages and techniques. Applications in areas such as office information systems, CAD/CAM, CASE, and geographic information systems have requirements that can not be easily satisfied by traditional programming languages and design techniques. The type systems of object-oriented programming languages encompass constructs that can be used to define complex objects, inheritance hierarchies, and behavioral properties of objects. An object state can be encapsulated, which enables updating the implementation of object classes without breaking application programs. These characteristics of object-oriented programming languages make them more adequate for handling advanced as well as traditional application domains than conventional programming languages.
Applications written in object-oriented programming languages need to interact with relational databases (RDBs) for several reasons.
A first reason is the existence of legacy data. Many enterprises have their data stored in relational databases (RDBs). This data is a necessary input to many decision making processes. Application programs written in object-oriented programming languages need to access this data in order to facilitate further processing.
A second reason is persistence. Application programmers often need to make some of the objects created in application programs persist between program invocations. The unavailability of wide-spread robust, scalable, and industrial strength object-oriented DBMSs make relational DBMSs a viable candidate for maintaining persistent data generated by object-oriented application programs. Relational DBMSs are favored over file systems, another candidate for storing persistent data, because they offer many useful functions such as concurrency control, recovery, physical data independence, and associative query capabilities.
A third reason is migration of legacy data to object-oriented DBMSs. Once standard commercial object-oriented DBMSs become available, many enterprises may want to migrate their legacy relational data to object-oriented databases managed by these object-oriented DBMSs. Application programs can be written to facilitate this process. These programs will need to access existing RDBs, retrieve data, construct objects by reformatting and assembling retrieved data, and then store these objects in an object-oriented DBMS. This will automate the process of reverse engineering the data.
The structural data model, which defines classes and connections between classes was introduced in "The Structural Model for Database Design," Proceedings of the International Conference on Entity-Relationship Approach to Systems Analysis and Design, 1980 by G. Wiederhold and R. Elmasri. In this model, classes are relations (i.e., tables as defined in a relational database). Connections that can exist between relations are of three types. The definition of these three connection types was introduced in the Wiederhold et al. article and further refined in the Barsalou et al. (91) article. In the detailed description of the preferred embodiments, we give the definition of these connection types and further extend them by modifying the definition of some types and defining new types. The resulting set of connection types is used in O-R Gateway to guide the mapping from relational schemas to schemas defined in the C++ type system.
The view-object model and its implementation in a system called PENGUIN was introduced in "Complex Objects for Relational Databases," Computer-Aided Design, October, 1990 by T. Barsalou and G. Wiederhold, and Barsalou et al. (91). PENGUIN allows for defining objects on top of a relational database. These objects are similar to relational database views but with object attributes rearranged to remove redundancy and reflect the nesting of constituent objects within more complex objects. In other words, instances of a view object are represented in a non-normal form. The structural model described above is at the core of PENGUIN.
PENGUIN operates as follows. A schema of the relational database is presented to the user in a graphical form based on the constructs of the structural data model, i.e., relations as nodes and connections of different types as links between relations. Three different graphical symbols are used to distinguish the three connection types. The user can choose one of the relations (nodes in the graph) as a "pivot" relation. PENGUIN then derives a candidate tree of relations and connections that is rooted at the pivot relation. Following that, the user identifies a subset of the tree by selecting nodes that need to be included in the view object (the pivot relation will be automatically included and need not be explicitly selected). The resulting subtree represents a view object, which is given a name by the user. Next, PENGUIN generates a data access function that can be expressed as a SQL query. The data access function includes the necessary join predicates to retrieve data pertaining to the view object. A user can issue a predicate-based query against a view object to identify the set of instances of interest. PENGUIN combines the user's query with the data access function of the view object to produce a query that retrieves only the needed tuples from the relational database. Penguin assembles the set of resulting tuples to build instances of the view object and represent them in a hierarchical fashion.
In "Abstracting Relational and Hierarchical Data With a Semantic Data Model," Proceedings of the 6th International Conference on Entity-Relationship Approach, New York, Sal March (ed.), 1987 by S. B. Navathe and A. M. Awong, the authors describe a process of 10 steps for mapping relations, attributes and relational constraints into entities, weak entities, categories, relationships, and attributes expressed in the Entity-Category-Relationship (ECR) data model. In O-R Gateway, we chose not to make use of the approach of Navathe et al. because the structural data model is simpler than the ECR model, and some ECR constructs, such as relationships and categories, have no direct counterparts in the C++ type system.