Temporal data is data that keeps track of changes over time. It is data, more precisely, that keeps track of changes along either one or two temporal dimensions.
One of these temporal dimensions is called “valid time” by computer scientists. In valid time, modifications to data in a database reflect changes happening in the world around us, and that we wish to keep track of. The second of these two temporal dimensions is called “transaction time” by computer scientists. Transaction time keeps track of when data is initially entered into a database. In addition, transaction time keeps track of modifications to data in a database that do not reflect anything that is happening in the world around us. Instead, these modifications are made to adjust the data in some way, independent of what is happening to what the data represents. For the most part, these adjustments are made to correct mistakes found in the data.
Attempts to manage temporal data that varies in valid time may be found in many business applications. Research into temporal data that varies in both valid time and transaction time has been ongoing in the computer science community for several decades. Although there is still no de jure SQL standard for managing temporal data, DBMS vendors have begun to add support for temporal data to their DBMSs and related products.
Before proceeding, we will introduce some needed terminology. We prefix most of our definitions of terms with “(def)”.
(def) temporal data: data that keeps track of changes over time.
(def) temporal table: a table that contains temporal data.
(def) temporal database: a database that contains at least one temporal table.
(def) temporal dimension: a semantic interpretation given to points in time and periods of time. Points in time and periods of time that belong to different temporal dimensions cannot be compared; they are incommensurable. Points in time and periods of time in the same temporal dimension can be compared as to their relative positions along the timeline which that temporal dimension interprets. For example, one time period may end before another time period begins, or may overlap that other time period, or may be contained within that other time period, or may end at the same time as that other time period, and so on.
(def) non-temporal table, conventional table: a non-temporal table, which we will also call a “conventional table”, is a table whose rows stand for statements about what we currently assert the things those rows represent are currently like. Each row describes what the object it represents is currently like; in other words, it describes the current state of that object. Also, by its presence in the table, each row represents the assertion that the statement made by that row is true. If we believed that the statement were not true, we would remove the row from the table.
In a conventional table, anything that is represented by a row in that table is represented by only one row. A conventional table contains no history about the things which its rows represent. It does not contain any history about erroneous data that was later corrected. It also contains no data about the future, i.e. about what the things its rows represent may be like at some future point in time.
(def) conventional database: a database all of whose tables are conventional tables.
(def) object: anything that can be distinguished from other instances of its type, and that can change over time. So, for example, specific customers, products, employees and invoices are all objects. A relationship represented by a row in an associative table, such as a table relating customers to salespersons, is also an object. An event is not an object, because once an event has taken place, it is over and cannot change.
(def) type of object: the kind of thing something is. A type of object is represented, in a database, by a table. Each instance of that type is represented, in a conventional database, by a row in that table.
(def) statement: a description of something. In a natural language, a statement is made by writing or uttering a declarative sentence in a “normal” context, i.e. a context in which it is understood that by writing or uttering that sentence, one is also willing to affirm that what it says is true.
Each row in a table makes a statement about the object it represents. For example, consider a row in a non-temporal table of customers, where the columns of the table consist of a customer number primary key, a customer last name, and a customer status code, and the row consists of the values, respectively, “C123”, “Jones” and “VIP”. This row makes the statement “There exists a customer with customer number ‘C123’, last name ‘Jones’ and status code ‘VIP’”. The presence of this row in its non-temporal table is the assertion that this statement is true.
(def) uni-temporal data: data that tracks changes in one and only one of the two temporal dimensions.
(def) valid time, effective time: the temporal dimension that computer scientists call “valid time”, we will call “effective time”. As stated above, in valid time, modifications to data in a database reflect changes happening in the world around us, and that we wish to keep track of.
(def) uni-temporal version data, uni-temporal version table: uni-temporal data that tracks changes in effective time we will call “uni-temporal version data”, and we will call temporal tables that contain such data “uni-temporal version tables”.
(def) version: a row in a uni-temporal version table is a version of the object it represents. Sometimes, we will use the word “versioned” instead of “version” in these expressions.
(def) transaction time, assertion time: the temporal dimension that computer scientists call “transaction time” is semantically a proper subset of the temporal dimension that we will call “assertion time”. As stated above, in transaction time, modifications to data in a database do not reflect anything that is happening in the world around us. Instead, these modifications are made to adjust the data in some way, usually in order to correct mistakes found in the data. We allow data to be associated with future periods of assertion time, and it is this association which extends the semantics of assertion time beyond the semantics of transaction time. But except in this context, we do not allow data to be associated with future periods of assertion time. When we do not, “assertion time” and “transaction time” are equivalent terms; all instances of the one are instances of the other. See also the later section entitled Fundamental Concepts: Transaction Time and Assertion Time.
(def) uni-temporal assertion data, uni-temporal assertion tables: uni-temporal data that tracks changes in assertion time we will call “uni-temporal assertion data”, and we will call temporal tables that contain such data “uni-temporal assertion tables”.
(def) assertion: a claim that a statement is true. A row in a uni-temporal assertion table is an assertion of the statement made by that row, about the object it represents.
In the phrases “uni-temporal version” and “uni-temporal assertion”, “uni-temporal” may be unnecessary, and so we may sometimes drop it.
(def) bi-temporal data, bi-temporal table: data that tracks changes in both temporal dimensions. We will call tables that contain bi-temporal data “bi-temporal tables”. Rows in bi-temporal tables are both versions and assertions.
bitemporal, bi-temporal: we note that, in the prior art, the term “bitemporal” is sometimes used instead of the term “bi-temporal”.
We proceed now to a review of the prior art.
2000: Developing Time-Oriented Database Applications in SQL.
R. T. Snodgrass. Developing Time-Oriented Database Applications in SQL (Morgan-Kaufmann, 2000).
This book, written by the leading researcher in the field of temporal data management, explains how to support temporal data management using Database Management Systems (DBMSs) and SQL current at the time of publication of the book. It describes an implementation of the basic concepts of temporal data management, an implementation which includes three kinds of temporal table. The first kind is called a “valid-time state table”, and is the way that the invention described in this book manages temporal data which keeps track of changes in valid time. The second kind is called a “transaction-time state table”, and is the way that the invention described in this book manages temporal data which keeps track of changes in transaction time. The third kind is called a “bitemporal table”, and is the way that the invention described in this book manages temporal data which keeps track of changes in both valid time and in transaction time.
This book is an invaluable source of SQL code fragments that illustrate the complexity of managing bi-temporal data and, in particular, that illustrate the complexity of writing entity integrity and referential integrity constraints against temporal data. It constitutes a reduction to then-current practice of the underlying concepts of temporal data management.
We will refer to these temporal concepts, and the reduction to practice as described in this book, as the “baseline model of temporal data management” (or, for short, the “baseline temporal model”). It constitutes the invention on which our own invention is claimed to be an improvement.
Our invention is an improvement on the baseline temporal model in several ways. First, our invention uses a single canonical schema to express all temporal data. This schema is a bi-temporal schema, and so all temporal tables in our invention are bi-temporal tables. In our invention, therefore, temporal data which changes only in valid time, or temporal data which changes only in transaction time, are not kept track of by means of tables with different schemas, tables which use only one of the two temporal dimensions. With our invention, either kind of uni-temporal data can be maintained as bi-temporal data for which one of the time periods is not specified on maintenance transactions and instead takes on a default value. With our invention, either kind of uni-temporal table can be manifested as a queryable object by defining a uni-temporal view on the underlying bi-temporal table.
Our invention, in contrast to the baseline temporal model, also enforces entity integrity and referential integrity constraints on temporal data at the time that data is being maintained. It does so by means of insert, update and delete transactions. Consequently, in contrast to the baseline temporal model, a temporal database maintained by our invention will never contain temporal data that violates these constraints.
Our invention, in contrast to the baseline temporal model, also provides a surrogate-valued unique identifier of the objects represented by rows in tables. This provides a common syntax for uniquely identifying those objects, which is the same in all temporal tables in the database. With this common syntax which, in our preferred embodiment is a single surrogate-valued column, the primary keys of all temporal tables will be syntactically identical. This syntactic identity of the primary keys of all temporal tables simplifies the program code required to manage all forms of temporal data.
Our invention, in contrast to the baseline temporal model, also expresses all temporal data requirements as metadata. It thereby avoids the need to express those requirements in relational logical data models, or in entity-relationship logical data models, or in physical data models. It replaces a design specification in data models with a declarative specification in metadata.
Our invention, in contrast to the baseline temporal model, also extends the concept of transaction time to cover data which exists in future transaction time. We call maintenance transactions that specify future transaction time “deferred transactions”, and we call the rows that result from applying them to a temporal database “deferred assertions”. With deferred assertions, we eliminate the need to maintain separate datasets of transactions which are not yet ready to be applied to their target tables.
2002: Temporal Data and the Relational Model.
C. J. Date, Hugh Darwen, Nikos Lorentzos. Temporal Data and the Relational Model. (Morgan-Kaufmann, 2002).
While the main focus of our patent application, and also of the baseline temporal model, is row-level management of bi-temporal data, the main focus of Date, Darwen and Lorentzos' book is row- and column-level management of uni-temporal data. While the main focus of our patent application, and of the baseline temporal model, is on implementing temporal data management with current DBMSs and current SQL, the main focus of this book is on describing language extensions that contain new operators for manipulating versioned data. On both these counts, then, this book is essentially orthogonal to this patent application. We include it here because of the prominence of its authors.
2004: Oracle 10 g Flashback Queries and Workspace Manager.
Oracle 10 g extended the flashback queries of Oracle 9i to retrieve all the versions of a row between two transaction times, and allowed tables and databases to be rolled back to a previous transaction time, discarding all changes after that time.
However, Oracle's 10 g flashback queries, like Oracle's earlier 9i flashback queries, are queries based on restoring data as of a past point in time, and then rolling forward to a designated later point in time. Like 9i flashback queries, 10 g flashback queries do not address the problems involved in creating and maintaining temporal data.
With 10 g Workspace Manager, Oracle began to address the issues involved in creating and maintaining temporal data. The Oracle 10 g Workspace Manager includes valid-time support, transaction-time support, support for bi-temporal tables, and support for referential integrity relationships between temporal tables.
Our invention, in contrast to this work, manages all temporal data with only one kind of temporal table, that being a bi-temporal table, and it provides a method of managing all temporal data in a single table for each type of object represented, e.g. of managing all temporal data about customers in one customer table. When this is not done, as it is not done in any of Oracle's products or patents or patent applications of which we are aware, then in order to respond to a query about specific objects, as data located in one or two temporal time periods describes those objects, it may be necessary to assemble, or “re-constitute” (to use Oracle's own terminology) the requested data out of a repository of “raw material”. This process is, conceptually, the same as the decades-old process of re-constituting desired data by restoring a database backup, and then using a transaction log to roll forward to a desired point in time.
Our invention, in contrast to this work, also eliminates the need to manage datasets which are collections of transactions not yet ready to be applied to production database tables. Oracle's workspaces are precisely such collections of transactions, and they are managed as datasets which are distinct from the production tables to which those transactions are intended to later be applied.
2005: Database System that Provides for History-Enabled Tables.
United States Patent Application 20070130226
Filed: Dec. 1, 2005
This prior art describes techniques for history enabling a table in a database system so that past versions of rows of the history-enabled table are available for temporal querying. The table is history enabled by adding a start time column to the table and creating a separate history table for the historical data. The start time field's value in a row of the history-enabled table indicates when the contents of the row last changed. The rows of the history table are copies of rows of the history-enabled table that have been deleted or updated. The rows include end time fields whose values indicate when the row was updated or deleted.
This prior art describes the management of data by associating it with only one time period. But first of all, if this time period were clearly valid time, or clearly transaction time, then it would describe only a method of managing uni-temporal data, whereas our invention is a method of managing bi-temporal data which fully subsumes the management of uni-temporal data.
Secondly, however, this prior art does not distinguish changes to data which were made to reflect changes to the objects represented by rows in a table, from changes to data made to correct errors in that data. It thus fails to distinguish valid time from transaction time, simply using transaction time as its single temporal dimension. It does not support either temporal dimension because it indiscriminately supports both.
Also, even if this prior art did describe a method for managing a uni-temporal history of changes to a table, it does not describe any method for managing data which exists in future effective time, or data which exists in future assertion time.
1997: An IBM Insurance Application Architecture Certified Bi-Temporal Database.
This database, designed and implemented by one of the authors of this patent application (Weis) supports the management and querying of bi-temporal tables. However, to carry out any modification to those bi-temporal tables, each of the physical transactions required to complete the modification must be “hand-written”. Thus a single modification, e.g. a request to extend the effective start date of an insurance policy back to the beginning of the calendar year, might easily require half-a-dozen or more physical transactions, submitted to the DBMS in a specific sequence, and constituting an atomic unit of work. Errors in writing sets of these transactions have occurred frequently even though only experienced DBAs and SQL coders are used to write these transactions.
Our invention, in contrast to this work, permits modifications to temporal data to be expressed as single insert, update or delete statements which we call “temporal transactions”. Our Asserted Versioning Framework (or, for short, “AVF”), shown as 111 in FIG. 1, accepts these temporal transactions and translates them into the multiple physical insert, update and delete statements which carry out the modification. This eliminates the frequent errors which appear when temporal modifications must be hand-written as sets of physical transactions.
2007-2009: The “Time and Time Again” Series.
This is a bi-monthly series of articles written by the authors of this patent application (Johnston and Weis). The series began in the May 2007 issue of DM Review magazine (now Information Management), and ran continuously in DM Direct (now Information Management Direct), up to the time of our preliminary patent application which was filed on Jun. 24, 2009.
One of the implementations in this patent application introduces the concept of an episode as a managed object, i.e. as an abstract data type, defined and manipulated by our invention. Episodes are first discussed in this prior art in Part 4, published in June 2007. But this and later discussions of episodes in this series of articles describe episodes in the context of a single temporal dimension, not in the context of bi-temporality.
In particular, in Part 19 (February 2008), we do provide a clear statement that episodes are the parent managed objects in temporal referential integrity relationships. We say that temporal referential integrity “prevents a child row from being effective before or after the continuous time range of an episode of the parent object.” This is a true statement about what the temporal referential constraint is, but nowhere in this series do we describe a process for implementing the constraint checking.
For these two reasons, we conclude that the use of episodes, in a bi-temporal context, to enforce temporal referential integrity is both novel and unobvious with reference to this prior art.
2008: Presentation Made to the Manhattan Erwin User's Group.
This is a presentation made on June 3rd to the ERwin User's Group in Manhattan. It contains no material not presented in the “Time and Time Again” prior art described above.
2009: Oracle 11 g Workspace Manager.
The Oracle 11 g Workspace Manager is Oracle's current implementation of support for the management of temporal data that is not yet ready to apply to a database. It is a way of supporting what we call “pending transactions”.
(def) pending transactions: collections of transactions that have been written but not yet submitted to the DBMS to be applied to a database.
Oracle's workspaces are hierarchical collections of pending transactions which can be modified, by means of insert, update, and delete transactions, without modifying the table from which they may have been originally extracted, and to which they will eventually be applied if they are not discarded.
However, this method of managing pending transactions creates many problems of its own. In particular, by distributing temporal data across different kinds and instances of physical and logical datasets, it separates this category of temporal data from other uni-temporal or bi-temporal data which exists in database tables. Thus, once a workspace is approved as containing data that the business wishes to include in those tables, IT technical personnel must begin by manipulating those datasets, and must note and resolve any inconsistencies that may have been introduced into the data.
Only after this work has been done will those technical personnel be able to move the designated data into the appropriate database tables. The latency incurred between the approval of designated workspace data, and its availability to the business user as rows in production tables, may thus be a matter of hours or even days.
In our invention, all temporal data is contained in one place, that being the temporal database tables managed by this invention. This includes the results of immediately applying all database transactions instead of managing some transactions in separate datasets. Those results are stored as rows in those tables, but the rows which correspond to pending transactions are not accessible to normal queries until the future point in assertion time associated with them becomes current. In some cases, these pending transactions create and modify data that will not become current for perhaps decades or centuries. This data will never become current “by accident”, i.e. by the mere passage of time. Instead, it will not become current unless and until it is explicitly approved to become current. Once approved, it is “moved up” from that decades or centuries from now period of assertion time, to a point in assertion time that may be only a few seconds from the present moment. This “move up” of this approved data is a matter of changing a date (or timestamp) on one or more rows already in the database table they belong to. Thus, once approved, the latency involved in changing the status of those rows to current data, accessible to normal queries, is a matter of seconds or milliseconds. Moreover, even prior to that approval, our invention makes it possible to retrieve that pending data, alone or in combination with other temporal data in the same production tables, using SQL queries that a business user can write.
Oracle's version tables are one kind of uni-temporal table. They are what we call uni-temporal assertion tables. Their valid-time tables are another kind of uni-temporal table, the kind we call uni-temporal version tables. Together with bi-temporal tables, this means that there are three categories of temporal table in Oracle databases that must be managed, even if we do not count workspaces of pending transactions. In our invention, all temporal data is managed in bi-temporal tables, and those bi-temporal tables share a common canonical schema, thereby simplifying the maintenance and querying of that data and those tables.
2009: Temporal Relational Database Management System.
United States Patent Application 20090248727
Filed: Dec. 2, 2008.
This patent application describes a method for managing changes in both data values and in database schemas. The basic approach is to store a history of changes in a transaction log in which each transaction is associated with points in time in several temporal dimensions, including valid time and transaction time. Then, whenever a query is submitted to the DBMS, the DBMS “re-constitutes” both the schemas which interpret the data values, and the data values themselves, from the transaction log. By specifying different points in time or periods of time in these queries, the DBMS is able to support bi-temporal data.
Our invention does not address the problem known as “schema evolution”, and so does not attempt to manage changes to database schemas.
The method described in this patent application, of storing data values as transactions, and then producing result sets for queries by processing an appropriate subset of transactions in response to each query, is not used in our invention.
Instead, our invention stores the results of transactions in the tables they update, and therefore our invention does not have to dynamically “re-constitute” the data requested by a query. It stores the results of transactions whose data describes the past, present or future states of the objects represented by the data in those tables. In this way, it supports valid time. It also stores corrections to data already entered, but keeps a record of the corrected data also. In this way, it supports transaction time.
In one of our implementations, our invention also accepts transactions which would otherwise have to be stored and managed in what are usually called “batch files”, “batch tables” or other collections of transactions which the business has written but is not yet ready to apply. By accepting these transactions as soon as they are written, and storing their results in the tables which are their targets, our invention eliminates the substantial costs of managing such collections of transactions. By associating the results of those transactions with a future period of time in the temporal dimension we call “assertion time” and which is a semantic extension of the temporal dimension which computer scientists call “transaction time”, those results remain effectively invisible to normal queries, but will automatically become visible when enough time has passed that their transaction times become current.