This invention relates to database management systems and more specifically to a method of differentiating between versions of database table rows representing the same information items at differing points in time.
Database management systems, e.g., relational database management systems, were designed to provide a means of storing information in a highly organized fashion. Each relational database contains tables which represent groups of related pieces of information. Each piece of information in a table is referred to as a table column. Multiple rows can be created for each table that contains a set of column values for specific entities.
Consider as an example, a situation in which a company is required to retain information in a relational database concerning its employees and the rate of pay that each employee receives at any given point in time. Two tables may be created, one to house information about the employees, i.e., an EMPLOYEE TABLE, and the other to house information concerning the pay rates of the employees, i.e., an EMPLOYEE.sub.-- PAY table. The EMPLOYEE table might contain columns such as an employee identifier and a name. The EMPLOYEE.sub.-- PAY table might contain columns such as an employee identifier, a pay rate, and the first date on which the pay rate is considered the effective pay rate for the employee. The relational database for this example can be represented as follows:
______________________________________ EMPLOYEE TABLE EMPLOYEE.sub.-- PAY TABLE EMPLOYEE.sub.-- ID EMPLOYEE.sub.-- ID NAME PAY.sub.-- EFFECTIVE.sub.-- DATE PAY.sub.-- RATE ______________________________________
Rows are created in the EMPLOYEE table as information is added to the database for each employee in the company. Similarly, rows are created in the EMPLOYEE.sub.-- PAY table as information concerning the pay rates of the employees is added to the database.
The EMPLOYEE.sub.-- PAY table is a versioned table. Several different versions of the pay rate for a specific employee could exist in the rows of the EMPLOYEE.sub.-- PAY table. These versions could represent historical pay rates, active pay rates, and future pay rates.
Although versioned tables can be used to capture differing versions of the same information at various points in time, retrieval of information from versioned tables is extremely complex. The effective values used in versioned tables (like the PAY.sub.-- EFFECTIVE.sub.-- DATE of the EMPLOYEE.sub.-- PAY table) actually represent a range of values as opposed to the single value contained in storage for the column. The value maintained in storage merely represents the first value in the range. The range of effective values may, in addition, be dependent upon other rows in the table with the same logical primary key. In the case of the EMPLOYEE.sub.-- PAY table, the logical primary key is the EMPLOYEE.sub.-- ID column. To determine the full range of date values on which a particular EMPLOYEE.sub.-- PAY table row is considered effective, other rows with the same EMPLOYEE.sub.-- ID must be examined. These implied rules must be considered while retrieving information from versioned tables, complicating the logic involved. Currently, there are few alternatives available to requesters of relational databases that allow them to differentiate between the rows of versioned tables. All existing methods have serious deficiencies.
In most cases, the task of differentiating between the rows of versioned tables is left entirely to the requester. The requester must include logic in their requests that identifies the version or versions that they require. The additional commands often include a complex SQL construct called a `subquery` that extracts the exact effective value of the version desired by the requester. By way of example, the SQL for a typical request that retrieves the current pay rate for employee `12345` from the EMPLOYEE.sub.-- PAY table is as follows:
______________________________________ SELECT PAY-RATE FROM EMPLOYEE WHERE EMPLOYEE.sub.-- ID = `12345` AND PAY.sub.-- EFFECTIVE.sub.-- DATE = (SELECT MAX (PAY.sub.-- EFFECTIVE.sub.-- DATE) FROM EMPLOYEE WHERE EMPLOYEE.sub.-- ID = `12345` AND PAY.sub.-- EFFECTIVE.sub.-- DATE &lt; CURRENT DATE) ______________________________________
Most of the logic involved in version differentiation is too complex for inexperienced requesters and adds a significant amount of development time for requesters who are experienced.
There are other problems inherent with including version differentiation logic in requests. The same version differentiation logic must be included in multiple requests, increasing the amount of maintenance required in the event that database changes occur. Reliability is also decreased since the version differentiation logic must be re-written in every request leading to a higher probability of error. The subquery logic that must be included using this approach is also relatively inefficient.
Another method of accomplishing version differentiation is to include the differentiation logic in database views. In the example presented, a view could be created which retrieves active EMPLOYEE.sub.-- PAY rows based on the CURRENT DATE system register. Using this method, a separate view must be created for each additional variable that can be used to reference effective values for retrieval. If a requirement existed to retrieve historic or pending employee pay rates from the example database, additional views must be created to accomplish this. By employing this method, inexperienced requesters are not required to include the complex subquery logic in their requests, but they are required to know which view includes the reference variable to achieve the desired results. Since the subquery logic may need to be repeated in multiple views, the maintenance and reliability problems described earlier are diminished, but still exist. The creation and maintenance of views also places an additional burden on database administration staff.
In some cases, database developers choose to create denormalized columns containing the current effective status of each row of a versioned table. An example of this method would be the creation of an effective status column in the EMPLOYEE.sub.-- PAY table of the example database. Although this method does alleviate some of the problems related to using the version differentiation logic, it places an additional burden on the data maintenance staff. Procedures must be created which maintain the effective status columns when changes occur which affect their values. Procedures must also be created and run on a regular basis in the case where the effective values are temporal, such as dates or times. In these cases, the effective status may change as a result of the passing of time.
An additional problem with this method is that only one version may exist with a particular effective status. If more than one is allowed to exist, version differentiation logic must be included in requests negating any benefits gained.
Database developers often choose to create multiple tables containing information that could be otherwise contained in a single versioned table. One of the tables may contain current versions, another may contain historic versions and a third may contain pending versions. Applying this approach to the example database, a column could be added to the EMPLOYEE table to contain a current pay rate, a table could be created to contain historic pay rate information and another could be created to hold pending pay rates. It does become easier to retrieve information from current versions, but the version differentiation logic must still be included when accessing historic or pending versions. In addition, requesters are required to know which of these tables must be accessed in different situations. Processes must also be created which move versions from one table to another as their effective status changes from pending to active, or from active to inactive.
Regardless of the method implemented, database management systems lack the ability to perform certain functions that are required to accurately process versioned tables in a relational database context. Database management systems can only enforce referential constraints on a strict column matching basis. This is not appropriate in the case of versioned tables. The effective values contained in versioned tables actually represent a range of values for which a particular row is effective. Currently, database management systems are unable to recognize these `effective windows` and are therefore unable to enforce referential integrity on an effective window basis. Nor are database management systems able to prevent overlapping of the effective windows of rows of a versioned table which represent the same information. Edit procedures must be created by the data maintenance staff to enforce these restrictions.
U.S. Pat. No. 5,280,612 to Lorie et al. (1994), U.S. Pat. No. 5,333,310 to Sakai (1994), and U.S. Pat. No. 5,440,730 to Elmasri et al. (1995) all disclose inventions which attempt to solve the efficiency problems related to retrieving information from versioned tables. The inventions disclosed in these patents do not attempt to significantly simplify the method by which information is requested from versioned tables on a database. Nor do they attempt to alter the database management system significantly in order to allow it to process versioned tables in a manner that prevents overlapping of effective windows or allows referential integrity to be enforced on an effective window basis.
U.S. Pat. No. 5,347,653 to Flynn et al. (1994), U.S. Pat. No. 5,357,631 to Howell et al. (1994), and U.S. Pat. No. 5,386,559 to Eisenberg et al. (1995) all disclose inventions that propose various methods of allowing database management systems to accommodate versioned tables. All of the inventions disclosed require that extensive modifications be made to the database management system. The three methods described represent a departure from the method by which non-versioned tables are processed in a relational database. None of the three inventions attempt to significantly simplify the method of requesting historical, active and pending information from versioned tables in a relational database. Nor do they allow for the database management system to enforce referential integrity in a manner that is consistent with that of non-versioned tables.