1. Field of the Invention
The invention relates to database systems generally and more particularly to versioned relational database systems.
2. Description of Related Art: FIGS. 1, 4, 5, and 8
Database systems are systems which store data and permit users of the systems to access items of the data by means of queries which describe the data in terms of the structure given the data by the database system. A common kind of database system is a relational database system. In such systems, the data is organized as a set of tables. A relational database table has a fixed number of columns and a variable number of rows. Each row has a field corresponding to each column, and the field contains a value. Queries on relational databases specify the data to be accessed in terms of the table or tables that contain it, columns in the table, and values of fields in some of the specified columns. For example, a simple table employees might look like this:
emp_noemp_name001Jones002Smith003Andrews004ToddThe table has two columns, named emp_no, whose fields contain employee numbers, and emp_name, whose fields contain employee names, and four rows. A query that returned the name of the employee with the employee number “002” would look like this in the standard SQL language used with relational database systems:                SELECT emp_name FROM employees WHERE emp_no=002;When the database system executes the query, it finds the row in the table employees whose field in the column emp_no has the value “002” and returns the value of the field in the row belonging to the column emp_name, or “Smith”.        
FIG. 1 shows the portions of a typical relational database system 101 that are relevant to the present discussion. The main components of system 101 are a processor, a memory 103 which contains programs 105 being executed by the processor and data 113 involved in the program executions, and persistent storage 123 for the database system's tables and other objects. Processor 121 may further receive inputs from input devices such as a keyboard and/or pointing device and produce outputs to a display device such as a CRT, as shown at 122, and may also receive inputs from and provide outputs to one or more networks, containing other processors, as shown at 124.
When system 101 is operating, programs 105 in memory 103 include an operating system 107, a relational database system program 109, and application programs 111 that employ the services provided both by operating system 107 and database program 109. Correspondingly, data 113 in memory 103 includes data for the application programs, data for the operating system, and data for the database system. Operation typically involves an application program 111, which provides a query to relational database program 109. Database program 109 executes the query on the tables in persistent storage 123 and provides the result to application program 111. Both RDB program 109 and application program 111 use the services provided by operating system 107, and execution of the application program or the database RDB program may involve inputs from and outputs to I/O devices and the network.
Continuing in more detail, persistent storage 123 contains two classes of objects: DB system objects, which are objects, including tables, that database system 101 uses to manage and operate the database system, and user objects 129, which contain tables and other objects defined by users of the database system. In the present context, the only system objects which are important are those belonging to data dictionary 127, which contains definitions of all of the objects in the database system.
User tables include base tables 131, views 118, and materialized views 141. Base tables 131(a and b) are the tables that are the actual sources of the data returned by a query. Views are tables which do not exist in their own rights in persistent storage 123, but are instead created using data from other tables. Data dictionary 127 contains definitions of base tables, of tables defined in terms of the base tables, and definitions of other objects that are defined for the tables. These other objects include indexes, which speed up access to the data contained in a column of a table, triggers, which define actions to be taken upon occurrence of events concerning the table, and constraints, i.e. rules about the values that must be in the fields.
A view is defined in the data dictionary by a query on other tables. The other tables may also be views, but the data must ultimately come from base tables. View 118 contains four columns and three rows. The data in columns 1 and 2 comes from columns 1 and 2 of base table 131(a); the data in columns 3 and 4 comes from columns 3 and 4 of base table 131(b); the query that defines view 118 has selected the fields of columns 1 and 2 of rows 3-5 of table 131(a) and the fields of columns 3 and 4 of rows 2, 6, and 8 of table 131(b). When relational database system 101 executes a query on a view 118, it must first make the view, which requires running the query which defines the view; consequently, if the view is frequently queried, the view may be made into a materialized view 141, which is a copy of the view which has been stored in persistent storage 123. Data dictionary 127 keeps track of the fact that there is a materialized view 141 corresponding to view 118, and database system 101 redirects a query of view 118 to materialized view 141.
Versioned Database Systems: FIG. 4
A versioned database is one in which different versions of the database exist simultaneously. What users of the versioned database see is the different versions, rather than the underlying base tables. Versioned databases are typically used in research and development situations: when a new line of research which will affect an existing database begins, a new version of the database is associated with the line of work, and the results of the research are incorporated into the new version of the database. When the work is to the point where it can be incorporated into the existing database, the new version is merged into the existing database.
A commercially-available system for creating and managing versioned databases is Oracle Workspace Manager, a versioned relational database system (VRDBS), implemented in the Oracle 9i™ database system manufactured by Oracle Corporation and described in detail in the Oracle 9i Application Developer's Guide-Workspace Manager, Release 1 (9.0.1), Part Number A88806-01, available in March, 2002 at technet.oracle.com. This publication is incorporated by reference into the present patent application. FIG. 4 presents an overview of a versioned relational database system 401 and FIG. 5 presents relevant details of its implementation in the Oracle Workspace Manager. Shown in FIG. 4 is an example table, emp_table 403, which is a table of employee information. There is a row for each employee. The table has four columns: emp_id 404, which contains a unique identifier for each employee, name 405, whose fields contain the employees' names, salary 407, whose fields contain the employees' salaries, and address 409, whose fields contain the employees' addresses. The values of the emp_id fields are the primary keys for the table and must consequently obey a uniqueness constraint within the table. As will be explained in more detail in the following, metadata may be added to emp_table 403 to make it possible to maintain different versions of the table. A table to which such metadata has been added is termed a version-enabled table.
Workspaces and Versions
In the versioned relational database, the different versions of emp_table 403 are maintained simultaneously, with independent access to each of the versions. Access to a given version is by means of a workspace. A workspace has a number of functions in the preferred embodiment:                it is a virtual environment that one or more users can share to make changes to versions of a version-enabled table.        it logically groups versions of one or more version-enabled tables, and isolates these versions until they are explicitly merged with production data or discarded, thus providing maximum concurrency; and        the relationships between workspaces determine how changes can be propagated from one version to another.Users can perform a variety of operations involving workspaces: go to navigates among workspaces; create, remove, and compress; refresh and merge propagate changes from a version in one workspace to a version in another workspace, and rollback returns to an earlier version in a workspace.        
There are two types of workspaces: non-continual refresh (non CR) and continual refresh (CR). The non CR workspace has a frozen view of version-enabled table data; changes made in any of the parent workspace of the non CR workspace after the workspace was created will not be visible from the workspace. In contrast, the CR workspace has a continually updated view of the version-enabled tables. Any changes made in the parent workspace also will be instantly visible from the child.
When a table is version-enabled, the table as it exists at the time it is version-enabled is the first version. This first version exists in a single workspace, termed the LIVE workspace. The first version may be modified in the same way as the table prior to versioning.
A new version is established by a savepoint operation, which freezes the old version. The old version is made read only, and further modifications are made on the new version. The version in a workspace on which modifications may be made is termed the workspace's current version. A user of a workspace may explicitly specify a savepoint operation, and there is an implicit savepoint operation whenever a new workspace is created. A new workspace is always created as a child of an existing workspace, and the child workspace starts off with a newly created version that is a copy of the parent workspace's current version as of the time the child workspace is created; the implicit savepoint further creates a new version in the parent workspace, and that version is now the current version in the parent workspace. The current versions in the parent workspace and the child workspace may of course be modified independently. A given workspace may thus have a linear set of versions that are descendants of the version made when the workspace was created and any of the versions in the linear set may have descendants in workspaces that are descendants of the given workspace. The versioned database system includes metadata which keeps track of the hierarchy of versions in the system.
FIG. 8 at 800 shows how workspaces relate to each other, how versions relate to each other, and how workspaces and versions relate to each other. Workspace hierarchy 801 includes workspaces 803(1 . . . 5), which were created in the order in which they are numbered. Each workspace contains versions 805 of a version-enabled table; versions 805(1 . . . 13) are numbered in the order in which they were created in workspace hierarchy 801. At the moment shown in hierarchy 801, the current versions are 805(6), 805(9), 805(13), 805(10), and 805(12). Of course, a given workspace may include versions of a number of version-enabled tables. The version hierarchy for the versions is shown at 807. A branch of the version hierarchy is a path through the hierarchy from the root version to a single leaf version. One such branch 809 is shown by means of heavy arrows. The ancestors of a given version are the versions that are above it on the version's branch of the hierarchy. Thus, the ancestors of v 805(9) are v 805(7), v 805(4), v 805(2), and v 805(1). The descendants of a given version are all of the versions below the given version in all of the branches of the hierarchy that pass through the given version. Thus, all of the versions are descendants of v 805(1), while the descendants of v 805(3) are v 805(5)80, v 805(13), and v 805(6). There are similarly branches, ancestors, and descendants in the workspace hierarchy.
One application of a versioned relational database is doing “what if” exercises with different versions of the data in the database system. In FIG. 4, such an exercise is being performed on a version enabled table, emp_table 403. When emp_table 403 was version enabled, the result was LIVE workspace 410(0) containing live version 403(0) of emp_table 403. A user of the versioned relational database has made three new workspaces 410(1 . . . 3) containing versions 403(1,3,5) of emp_table. When workspace 401(1) was made, it contained version 403(1) and version 403(2) was made the current version of workspace 410(0). Both version 403(1) and version 403(2) are copies of version 403(0). Similarly, when workspace 410(2) was made, it contained version 403(3) and the new current version of workspace 410(0) was version 403(4) and when workspace 410(3) was made, it contained version 403(5) and the new current version of workspace 410(0) was version 403(6). Live emp_table 403 (0,2,4) was not modified during creation of workspaces 410(1,2, and 3), so versions 403(0,2,4,6) are identical with each other, with version 403(6) being the current version in workspace 410(0). The version hierarchy for versions 403(0 . . . 6) of emp_table 403 is shown at 411.
Having made the new version, the user then modifies the new version in the version's workspace as required. In the example, current version 403(6) in live workspace 410(0) is a table of the current employees. One of the workspaces, 410(1), contains a pessimistic version 403(1) for a business scenario in which times are hard and employees must be reduced to a minimum; one 410(2), a current level version 403(3), for a scenario in which the present business conditions continue and the number of employees needs a smaller reduction, and one 410(3), an optimistic version 403(5), for a scenario in which business improves and only a minimal reduction need be considered.
Workspace hierarchy 401 has at its top workspace 410(0) for live emp table versions 403(0,2,4,6). Current version 403(6) is currently available to users other than the ones who are making the versions. At the next level are the three workspaces 410(1 . . . 3) for table versions 403(1,3,5). To make an additional workspace based on any workspace in the hierarchy, one simply goes to that workspace and makes the new workspace. The additional workspace is a child of the workspace in which it was made. Change propagation operations are available to propagate changes in a workspace's version up or down the workspace hierarchy to a version in another workspace. The version to which the change is propagated replaces rows which have changed less recently than the corresponding row in the version from which the change is propagated with the corresponding row in the latter version. Thus, if times are hard, workspace 410(1) may be merged with workspace 410(0). After the merger, the current live version 403(6) will have the most recently modified rows from pessimistic version 403(1) and live version 403(6).
Change Propagation Operations in the Workspace Hierarchy: FIG. 8
Operations in the VRDBMS which may result in propagation of changes from one table version to another are the following:                when a data manipulation (DML) operation is performed on a record in a table version in a workspace. DML operations are always performed on the current version.        when a DML operation performed on the current version of a table a workspace results in a continual refresh of the current versions of the table in descendants of the workspace that have been created with the continual refresh operation; thus, if workspaces 803(2) and 803(3) are continual refresh (CR) workspaces, continual refresh will apply any change made in current version 805(6) in LIVE workspace 803(1) to current version 805(9) in workspace 803(2) and to current version 805(13) in workspace 803(3).        when a user performs a refresh operation on a workspace; refresh propagates the changes in the current version in a parent workspace relative to the current version in a child workspace to the current version in the child workspace; thus a refresh operation that specifies workspace 803(4) applies changes from current version 805(9) in workspace 803(2) to current version 805(10) in workspace 803(4).        when a user performs a merge operation on a workspace; merge propagates the changes in the current version in a child workspace relative to the current version in the child's parent workspace to the current version in the parent workspace; thus a merge operation that specifies workspace 803(4) propagates changes from current version 805(10) to current version 805(9) in workspace 803(2).        when a user performs a DDL operation that redefines a version-enabled table. Such an operation of course affects every version of the table.Implementation of the Versions: FIG. 5        
In the Oracle Workspace Manager, the table versions in the workspaces are implemented by displaying the results of queries on tables. The queried tables may be views or base tables. The tables contains not only the information that is of interest to the users of the versioned database, but additional columns for version information which the database system uses to generate the query results corresponding to each of the versions. FIG. 5 shows a redefinition 501 of emp_table 403, emp_table_LT 501, which contains version info columns 502 and additional rows of information 511(1 . . . 3). The information in the version information columns and in the additional rows permits the generation of results corresponding to tables 403(0,3, and 5) from table 501, as shown at 513, 521, and 523.
Continuing with version information columns 502, there are four such columns:
                version number 503: the value in a row's field in this column is the number of the version that the row with its present contents was created in.        child version numbers 505: this field contains a list of numbers of descendant versions of the version specified at 503 where the row has been modified.        deleted flag 507: this field indicates whether the row has been deleted in the version indicated in field 503; and        lock flag 509: this field indicates whether the row is currently locked.Changes are made in table 501 by users working in various workspaces. Changes may include modifications of values in fields of the table and addition or deletion of rows.        
When a row is changed in a version of the table so that it is no longer identical with a row higher in the hierarchy, the new version of the row is added to table 501, with version number 503 set to indicate which version the added row was changed in. Child version field 505 of the row in the parent version that the changed row was formerly identical with is changed to indicate that the parent row is no longer contained in the child version. When a row that is present in a version at a higher level of the version hierarchy is included in a version at a lower level of the hierarchy, child versions 505 is updated in the row belonging to the higher-level version to indicate the fact that the row is included in the lower-level version.
In FIG. 5, live emp_table 403(0) has rows for six employees; these rows appear at 511(0) in emp_table_LT 501. The versions in workspaces 410(1-3) differ from the version in workspace 410(0) in that in the query results for each version, a greater or lesser number of rows are deleted from version 403(0). In pessimistic version 403(1), rows for three employees are deleted; in current level version 403(3), rows for two employees are deleted; in optimistic version 403(5), only 1 row is deleted. Delete flag 507 is used to indicate whether a row is deleted in a given version. Thus, as shown in rows 511(1), pessimistic version 403(1) is specified in table 501 by including a row for each employee whose row is to be deleted. Version number field 503 indicates that these rows have been changed in version 1 and delete flag 507 indicates that the rows have been deleted in that version. Current level version 403(3) is specified in the same way at 511(2), and optimistic version 403(5) is specified at 511(3).
In a preferred embodiment, the views for the workspaces are generated dynamically from a view which appears in built-in objects 515 as emp_table view 518. This view is simply a view of emp_table LT 501 which does not include the columns that contain version information 502. For a given version v 503, the view (v) for the version v shows the set of rows from <table_name>_LT such that the deleted flag field of the row is ‘N’ and one of the following holds:                the row is tagged with version v.        the row is tagged with a version v′ that is at a higher level in the hierarchy than v AND the child version field of the row does not contain a version that is between v′ and v in the hierarchy.        
To make the result 513 corresponding to pessimistic version 403(1), the database system returns a result which includes all rows of emp_table 518 which correspond to rows of emp_table_LT 501 for which the deleted flag is ‘N’ and one of the following holds:                the row is tagged with version 1        the row is tagged with version 0 and the child version field of the row does not contain the version 1.        
Thus, the rows belonging to 511(1 . . . 3) are all excluded because their delete flags are set to “Y” and those rows in 511(0) are excluded whose child vers field 505 contains the value “1”, leaving the rows whose name fields 405 have the values Finch, Johnson, and Meyers, which is the result shown at 513. The results for the other versions 521 and 523 are produced in the same fashion.
As shown at 515, in addition to emp_table view 518, the versioned relational database system includes auxiliary views 517 for showing conflicting rows on a merge operation, for showing locked rows, for showing differences between two versions, and for simultaneously showing data for multiple versions. The versioned relational database system also includes INSTEAD_OF triggers 519 for transforming insert, update, and delete operations on view 518 into operations on the proper rows of the tables used to generate the query results for the various versions.
Another example of a versioned database system is disclosed in U.S. Pat. No. 5,890,166, Eisenberg, et al., Versioned database management system in which tasks are associated with promote groups which comprise a set of parts whose changes are to be promoted, issued Mar. 30, 1999. A feature of the versioned database system of Eisenberg which is of particular interest in the present context is that the versioning is done at the level of individual rows, as well as at the level of subsets of tables, as in the Oracle Workspace Manager. Eisenberg terms his versions at the subset level variants. Variants are thus the equivalent of workspaces. Like workspaces, variants are strictly hierarchical. At the level of the row, Eisenberg's versioned database system records the previously existing version or versions of a row that a given row was derived from and provides derivation graphs for its records. Because the rows may be derived from more than one version, the derivation graph is a directed acyclic graph, or DAG.
A problem with both the Oracle Workspace Manager and Eisenberg's versioned database system is that at the level of variants or workspaces, both the Oracle Workspace Manager and Eisenberg's versioned database system are organized hierarchically, that is, a given variant or workspace may be derived from only one previously-existing variant or workspace. The variant or workspace from which the given variant or workspace is derived is often termed the given subset's parent. The limitation that a given variant or workspace may have only one parent often prevents the organization of the versioned database system from reflecting the way in which the work on the versions is done. To give a simple example: the workspaces of FIG. 4 are being made to explore alternative layoff strategies; such strategies are typically the concern both of the human resources department, which is specialized in such matters, and of the general management of the company, which must live with the consequences of the strategy that is finally adopted. What is needed to reflect this organizational reality is a versioned database system which permits a workspace to have more than one parent workspace, so that the workspaces for the layoff strategies may have a HR parent workspace and a management workspace. In graph-theoretic terms, what is needed is a versioned relational database system in which parent-child relationships between workspaces may have the form of a DAG as well as that of a hierarchy. It is an object of the techniques disclosed herein to provide such a versioned relational database system.