1. Field of the Invention
The invention relates generally to computer-based storage of events and more particularly, to the optimized representation of events and their application to relational databases.
2. Description of the Related Art
For many years, the database model at the heart of many computer applications has been the Time Relational Data Model. This model uses date-times stored in database records—mainly relational databases—to create histories that record the changing events in the world in which people live—event date-times. The model also uses date-times stored in database record to record the date-times when computer processes take place—timestamps.
Database tables keep track of real-world events such as when someone is hired, when they get promoted, when they get married, when they have children, and when they retire. Because processes like counting the number of employees a company has, or reporting the people covered under employee insurance plans, or paying employees are based on the state of the data for a reporting period or for a particular point in time, the Time Relational Data Model is relied upon to provide time-based data. In some implementations of the TRDM, a single event date-time is stored in each record. That date-time value, the Effective Date, records the date-time that a real-world event took place. Other implementations simplify queries by adding an End Date.
Often a record is placed in the database after the real-world event has taken place. For example, an employee is hired and starts work at 9 o'clock on a Monday morning. The first thing she does is fill out some forms and turn them into the personnel department. Perhaps a day or two goes by before someone in the personnel department enters that data into the database. The event being recorded, the hire, occurred on Monday and is recorded in the database with an Effective Date of that event date-time. If this employee worked for the company for a year and then left, a second record would be written into the database recording the second event, the termination.
The records in a database are usually organized into tables where each table has the same fields and pertains to the same subject matter. When database tables record events and are stored with event date-times they are referred to as a history. Histories record the events related to a particular subject such as a person, an organization, a position, an object such as an automobile. The history for each individual subject is independent of the histories for others. Each record of a history contains the state of the history between the event date-time of that record and the record with the next highest date. If no records occur after a particular record, then the state recorded by that record applies into the indefinite future. For example, a hire record might record the employment state as “Active.” A termination record might record the employment state as “Inactive.” The state of the employment is “Active” for any date between the hire record and the termination and “Inactive” on the termination date and for any date thereafter.
In some histories, more than one event can take place at the same time, e.g. an employee is promoted and given a salary increase on the same day. When more than one event can occur, the database model usually includes a sequence field or other mechanism so that the two events can be distinguished and an order of occurrence determined. The last record with the same date-time contains the state of the history for the period between that record and the record with the next highest date-time.
Because retrieving the specific record that applies to a point in time results in a complex query, particularly when there are more than one record for the same date-time, many database models contain two fields. One date-time, the “Effective Date” records the date-time of the real world event and the second, the “End Date” is set to either the date-time immediately prior to the following record or a date-time far in the future. Such a data model greatly simplifies the retrieval of the specific record that applies on a date-time. This model can be referred to as the Effective Time Span model.
In the Effective Time Span model, the End Date for a record having no subsequent records is initially set to the indefinite future. If an event occurs subsequent to the Effective Date of the record, the End Date is updated to a date-time one instant lower than the Effective Date of the new record. In this way, there two records generally do not apply to the same time span.
When records are written into a database much later than when the real world event occurred, computer processes often have to account for the lateness of the entry. If an employee's salary change was written into a database after the payroll for the date of the salary change, then the amount the employee was underpaid must be computed when the payroll is run after the record is present in the database. This process is referred to as “retroactive” processing. In order to facilitate retroactive processing, many databases contain a field that records the date-time the record was written to the database. This field is sometimes referred to as the Create Timestamp. With such a field, it is possible to calculate the retroactive period—the gap between the real world event and the time the record was written to the database—and, it is possible for computer processes to detect retroactive records. A payroll process might record each time it runs, and then it would be able to detect records that were entered into the database between processes and determine if they applied to prior periods.
Data entered into databases is sometimes entered incorrectly. When records in histories are corrected, either the incorrect record is rewritten with the correct data or a new history is created and the incorrect record is marked as invalid. If the incorrect record is updated, the record of the inaccuracy can be lost. One purpose of leaving the invalid records in the database is to facilitate processes such as the retroactive process described above. If a previously computed payroll was incorrect, both the incorrect record and the correct record would be called for to compute the effect on the paycheck. A common way to indicate the correction of a record is to have a field, a Record Status, that indicates whether the record is valid or not. The field, Record Status might contain a value “A” indicating that the record is Active and a value “C” indicating that the record is Corrected. In this way, computer processes can retrieve only Active records, Corrected records or both.
Retroactive processing often is facilitated by the addition of a fourth date, an Update Timestamp. This field records the date-time when a record was updated. Initially this field contains the same value as the Create Timestamp, but it is changed whenever the record's End Date changes or it is replaced by a corrected record and its Record Status is changed to Corrected. With an Update Timestamp, it is possible to determine the period that the database was incorrect.
If a record is found to be incorrect and there is no replacement—ordinarily it should be deleted—then in order to support retroactive processing, the record is marked as Corrected and its Update Timestamp can be set to the time of the update. This operation is referred to as a “Logical” deletion.
Four fields, Effective Date, Create Timestamp, Update Timestamp, and Record Status generally are sufficient to enable computer processes to select the records active at any point in time, or over any period and to determine whether the state of a history was incorrect and the period for which it was incorrect. Some queries are more efficient if an End Date is used but this is not essential. Several variations of these five fields with various names are widely used in the art.
With either the four-field or five-field scheme, it is feasible to perform “As of” queries on a database. Such queries might select all the Active records that apply to a point in time or apply to some period. It is also feasible to perform queries that select corrected records that applied to an effective date at an earlier database time. However, it is difficult to perform queries that select all the records that apply to a point in time as of an earlier database time—referred to as a Database Regression. For example, a report is prepared on Jan. 2, 2002 that shows the number of employees in a company as of Jan. 1, 2002. The same report is run on January 15th and shows a different number of employees because the database has been updated with events that occurred before January 1st but had not been applied by January 2nd. The report is run later and shows a different count. As time goes on, the reported number continues to change becoming ever more reflective of the actual state of the real world on the date in question. With the five-field structure, it is difficult to construct queries that select on both event times and database times. This problem can become much more severe when queries are complex involving the coordinated selection of several tables where each has the five-field structure.
Because a database is often changing as transactions are added, running a query at two different times can produce different results, even running a query a few minutes later can produce different results. When several queries are run with the same as of date, as might happen at the end of the month or quarter, the queries might produce different results even though they are similar. Unless the database management system provides an option to eliminate updates that are performed while a query is running (Repeatable Read), erroneous results can be obtained due to the database being updated while a query is running. However, such options can be expensive in large databases. These problems have led many companies to make copies of their databases in order to produce both consistent results and eliminate or incorporate corrected records. Having various versions of databases is both expensive and cumbersome. And if a copy of the database was not made at some point, it may not be possible to recreate the database.
Consequently, there is a need in the art for a data structure that can support retroactive processing, regression of history databases, and repeatable query execution. There also exists a need for a system to produce and to use such a data structure. The present invention meets these needs.