Modern computer installations generate, manipulate, and store enormous quantities of data. Data base management systems have emerged as an indispensible component of such installations, serving the purpose of promoting efficient data storage and program design, enhancing file maintenance and modification, and eliminating data redundancy. The typical data base management system (DBMS) includes programs which interface with designers and users, accept and understand models or tables for subsequent use in organizing data, organize data according to the models or tables, store and retrieve the data in the actual data base contained in a computer storage subsystem, perform queries on the data, and generate reports based on the stored data.
A DBMS may be designed to store data according to any of a variety of data models, where the data model is the basic organizational concept for the underlying data base. These models, or schemas, for data base organization can be divided into several different classes, including hierarchical, network, relational, and entity- relationship. A detailed discussion of these types of databases may be found in "The Database Book," by Mary E. S. Loomis, Macmillan Publishing Company, New York, N.Y. 10022 (1987). The present invention is applicable to all of the above database schemas. The preferred embodiment is described with particular reference to the entity-relationship modelling methodology provided by the Repository Manager/MVS product, which uses the DB2 relational DBMS as a back end to manage the storage and retrieval of data on the computer hardware. Entity-relationship databases are discussed in "The Entity-Relationship Model--Towards a Unified View of Data," by Peter Chen, ACM Trans. on Data Base Systems, Vol. 1 (1976). Relational databases and DB2 in particular is discussed in "IBM Database 2: General Information," IBM Publication GC26-4373 (1990). The Repository Manager/MVS product is discussed in "Repository Manager: Concepts and Facilities," IBM Publication SR21-3608 (1990).
A significant problem in maintaining any data base whose data entries represent objects, events, people, or relationships in the real world, is that although those things may change over time, the typical DBMS maintains only a single version of any given entry, making it impossible to concurrently represent a thing in its past, present, and future states. A second significant problem, which arises in maintaining a data base which is shared among a plurality of users, involves the toleration of concurrent but independent work on the same data entries by different users without sacrificing the semantic consistency of the data. Yet a third problem in maintaining a data base involves maintaining a record of the state of the data base itself as it existed at given times in the past. Such information is often needed for error recovery and for audit-trail purposes. Typical solutions to this problem involve taking "snapshots" of the data base and logging change activity, so that if necessary the data base can be "reconstructed" as it existed at some point in the past. This reconstruction is usually a time-consuming batch procedure, and a system so constructed cannot allow the past and current data bases to be accessed concurrently.
A solution to all of these problems is to maintain versions of the data entries. These versions may correspond to the different states of the real-world things represented, or to work in progress by different users. Such an approach is called versioning, and in general requires that the DBMS control the creating of the versions and all access to them, both to assure the semantic consistency of the data in all its versions, and to free users from the need to deal with the additional complexity that such a versioning scheme requires. Users of non-versioned data base systems sometimes simulate versioning by giving qualified names to the data base entries. However, this approach is undesirable, because it conceals from the DBMS the true identity of the things represented, and makes it impossible for the DBMS to verify the semantic consistency of the data base.
The generally preferred approach to implementing versioning is to provide direct versioning of entries in the DBMS, with the versioning managed by the DBMS to preserve the semantic validity of the data in the system. Such a system provides both parallel and serial versioning, with the capability for the user to define a hierarchy of versions, and to direct the DBMS to move versions of data from one hierarchy level to another. It also provides historical versioning of the database, allowing the user to view the data as it existed at any arbitrarily-selected time in the past. It provides a simplified programming interface that allows a user tool to interact with the data as though it were not versioned, the specification of which version is seen being made outside the program.
A versioned entity-relationship database management system (VDMS) typically includes several external interfaces for use in manipulating entities and relationships in the database. These interfaces make versioning transparent to the user. The add interface is an external interface for adding an entity or relationship. The instance added may or may not be the first version of the entity or relationship. The update interface is an external interface for updating an entity or relationship. An update may or may not result internally in the creation of a new version of the entity or relationship. The delete interface is an external interface for deleting an entity or relationship. A delete does not result internally in the deletion of an instance. It may or may not result internally in the creation of a new version of the entity or relationship, which is flagged as "deleted." The retrieval interface is an external interface for finding out what entities exist and what their properties are, including the relationships in which they are involved.
In ER data management systems generally, when an entity is updated it retains all of its relationships, and when an entity is added it starts with no relationships. However, satisfying both of these requirements presents special problems for ER database management systems having versioning capability. The first problem involves relationship preservation during update. In a VDMS whose interfaces make versioning transparent, when the update interface is used to update an entity, a new version of the entity may be created by the VDMS. As long as the part key of the entity is not changed, if the retrieval interface is used to find out what relationships the entity is the source or target of, it should yield the same results before and after the update.
The second problem involves relationship absence after add. In a VDMS whose interfaces make versioning transparent, when the add interface is used to add an entity, there may already be versions of that entity which exist but are not seen from the current point of view. There may also already be relationships of which those entity versions serve as the source or target. After the entity is added, the retrieval interface should indicate that the entity is not the source or target of any relationships from the current point of view.
One way to satisfy the principle of "relationship preservation during update" is to have each relationship instance connect a specific version of a source to a specific version of a target, but when a new version of the relationship source or target is created via the update interface, automatically create new relationship instances connecting the new source or target version to all of the same instances the previous version was connected to. However, this can result in an enormous number of relationship instances. Consider the simple case of two entities connected by a relationship. If the update interface were used to update each entity five times, creating a new entity version each time, twenty five relationship instances would result, thus, in practice, and particularly for a VDMS which creates revisions to track past states of an entity, this approach is not feasible.
Another way to satisfy the principle of "relationship preservation during update" is to have each relationship instance connect any version of a source to any version of a target. However, this can cause problems satisfying the principle of "relationship absence after add" if the relationship source and target are identified by their part keys. For example, consider the following sequence of events, which deals with two entities (A and B) and a relationship between them (A.B). The example assumes a two-level variant hierarchy with Production as the root variant, and Test as its only child. Therefore, the search path for the Test point of view is Test.fwdarw.Production, and the search path for the Production point of view is Production.
1. From the Production point of view, the add interface is used to add a Production version (the first version) of two entities A and B. These instances are seen from both the Production the Test points of view. PA0 2. From the Test point of view, the delete interface is used to delete the entity A. A "deleted" Test version of A is created. The Production version of A is still seen from the Production point of view. PA0 3. From the Production point of view, the add interface is used to add a Production version (the first version) of relationship A.B. This instance is seen from the Production point of view. In some VDMS's, A.B might also immediately become part of the Test point of view, while in others, it might only become part of the Test point of view after some action is taken to refresh the Test point of view with recent changes from the Production point of view. In either case, once A.B becomes part of the Test point of view, it does not serve to connect A and B there, because A has been deleted. PA0 4. From the Test point of view, the add interface is used to re-add entity A. A Test version of A is created. PA0 5. From the Test point of view, which includes relationship A.B, the retrieval interface is used to find out what relationships A is the source or target of. Because we have assumed that relationships connect any version of their source to any version of their target, A is found to be related to B via relationship A.B, in violation of the principle of "relationship absence after add."
Thus, the above methods fail to provide for both relationship preservation during update and relationship absence after add in a manner that does not result in excessive multiplicity of data. Moreover, no other method has heretofore been available which provides a VDMS with the ability to resolve sources and targets of relationships while complying with these requirements.