Databases are very important in the modern world. Data is input to the database, and many kinds of data have a time-related value: the database is a time-relational database. For example, Mr Smith works for a company. This database entry shows that he joined on 1st Mar. 1995. That database entry has a time-associated with it: 1st Mar. 1995. The database entry may have been created before, on, or after 1st Mar. 1995. When an enquiry is made, today, of the database it says that Mr Smith joined on 1st Mar. 1995. A second example is the price of goods sold by a business. A price list database shows that product XP2001 costs $100. That is what the database says today.
However, the value of an attribute of a database entry is often not static in time. For example, Mr Smith may really have started a week later. If this is discovered on, say, 10 Jun. 2003, and corrected that day in the database, an enquiry performed on 9th Jun. 2003 would say Mr Smith jointed on 1st Mar. 1995. The same enquiry performed on 11 Jun. 2003 would say he joined on 8th Mar. 1995.
Similarly, the price of goods can change. Product XP2001 has its price set at $100 on, say, 1st Jun. 1996 changed from $100 to $110 on 1st Jan. 1997, a special discount offer had reduced it to $80 on 1st Jun. 1997, and it went up to $120 on 1st Jan. 1998. An enquiry of the database “how much is product XP2001” will give a different answer at different times of making the enquiry.
The above examples illustrate two types of change of value of an attribute (price/start date): one because of an error, now accepted as being an error, and another due to a real change in the value of the attribute.
Other attributes of a data entry can change with time for other reasons. For example Mr Smith joined the company in the accounts department, but was transferred to the marketing department on 1st Jan. 1999. The enquiry “who is in the accounts department” will give a different answer if it is asked at different times.
How the above types of problem (attributes of data entries changing with time, and the, consequential effect of that on data structure, and the need to be able to turn the clock back and see what the answer would have been had the question been asked at a different time) are dealt with is a difficulty for database designers, especially designers of relational database where relational rules between different data entries exist, and need to be made consistent. Creating an audit trail is necessary before changes are made—so as to preserve the previous picture. Additional application software, such as audit and history tables can be built to do this automatically as a change is made—but that software takes up memory, and the data saved/stored in the audit trail takes up disk storage.
The problem is resolved currently by having a set of relational rules that control what can be done allowably in a computer with a specific data entry (or data entity), depending upon the attributes of the entry and the rules relating to those attributes, and also by arranging the data entry so as to be able to roll-back the database to an earlier time. This necessitates setting up the structure of the database originally in such a way that an attribute that can change with time has a series of value entries associated with different times, and flags to enable a search command to identify and use the correct attribute value when a “search as of time X” command is used.
This works, but has limitations. Firstly, if an attribute in the database is not set up initially to be able to have different values at different times, then it is not possible to roll back the time for that value (short of locating all database entries that have that attribute, modifying the data structure, and entering values at different historical times for them—which is an appalling prospect).
Secondly, the relational rules required are complex, and take up a lot of processing power of a computer when a query is made; and/or when a change to an attribute is made.
Furthermore, some changes just cannot be made to the database (for example Update Cascade). It is possible to implement Delete Restrict and Delete Cascade, but not, for example Update Cascade. There are no foreign key rules for update: it is simply not allowed.
The above is a relatively high-level discussion of some of the issues in the background to the invention. Their now follows a more technical level discussion.
The majority of commercially available non temporal databases use essentially the teachings of Dr. E. F. Codd, the leading authority on relational database technology and the originator of the relational model, who sets out the rules for a relational model in a paper entitled “Codd's 12 Rules for Relational DBMS” published in The Relational Journal—Issue 1.
A relatively brief and concise definition of the relational model itself, appears in the article “RM/T: Extending the Relational Model to Capture More Meaning.” (Chapter 2 of “The Basic Relational Model) in the Association for Computing Machinery's “Transactions on Data Base Systems” (December 1979). The reader is directed to read these now, and the content is hereby incorporated by reference.
However, the work done by Dr. E. F. Codd does not incorporate time in the relational model. In fact, the database technologies that have been built according to Codd's model hold data at an instant in time i.e. current time or now. Incorporating time into database models is currently an area for debate and the modelling of temporal data and temporal databases has been the subject of much work, for many years, by both academia and the biggest computer/software companies in the world. We refer the reader to the publication “C. J. Date, An Introduction To Database Systems”, Volume 1, Fifth Edition, published by Addison-Wesley Publishing Company, and in particular to pages 717 to 720 where, there is a discussion of time in databases. The reader is directed to read that now, and the content of pages 717 to 720 is hereby incorporated by reference.
In general, the research and commercial databases are based on an approach where each tuple (row) representing a database object is augmented by two attributes to denote a lifespan i.e. from the time of the insert of the data object to the time of the deletion of the data object. These attributes are typically denoted as a start-date and an end-date. There is a great deal of published work on how the attributes might be managed, and how they relate to each other. How to initiate updates and deletions of related entities/objects in a database without creating logical inconsistencies in the data is a big area for research.
Indeed, the number and complexity of the algorithms that are running in the background for a relational database with entries that have a temporal dimension is ever increasing, and uses up quite a lot of processing power of the computer. This can slow data transfer rates (e.g. in networked/telecommunicated systems).
Current Temporal Relational models and theories allow physical update and delete actions and as a result the maintenance of data integrity is a complex task that is magnified by the introduction of time.
In ‘An Introduction to Database Systems’, C. J. Date discusses the topic of time with respect to database systems. He points out that a considerable amount of research has gone into the problem in recent history. He also notes that although the theory of modelling time has been explored, there are relatively few papers that concern themselves with implementation and the issues surrounding the actual delivery of a relationally consistent, time based system.
Most research is based on extending a schema, FIG. 7a, by adding a Start attribute 701 and End attribute 702 to either a tuple 705 or an attribute 706. These Start and End attributes are managed by the system. This prior art research concentrates in the main on the management of these attributes as data is updated, deleted and inserted. This research typically restricts itself to using a single-entity with simple entity attributes such as surname to demonstrate the theory. Even with this unrealistic, restricted view of data the complexities of this research soon manifest themselves in the form of discussions and theoretical dissertations on the management of these simple attributes. Unresolved problems remain such as how to manage updates and deletions of these simple models without creating logical inconsistencies in the data such as current duplicates, value equivalent duplicates, sequenced duplicates and non-sequenced duplicates. These and many other topics are the substance of thousands of theoretical dissertations in the temporal database community.
The prior art rarely considers the next logical step of more realistic examples involving multiple entities related by foreign keys such as employee and department or order and orderline. Applying the available prior art research to more realistic data models with multiple related entities means that the problems of current research are magnified greatly and so are often ignored by the theorists as they struggle to provide a complete solution to even the most simple examples.
C. J. Date concludes that a single pair of Start/End attributes is easily seen to be inadequate, because it is necessary to distinguish between effective time and update time and he surmises that there is a need, depicted in FIG. 7a, for an effective (or valid) Start/End pair of attributes 701, 702 and an update (or transaction) pair of attributes 703, 704 for each attribute in a tuple 706. Furthermore, he states that the effective Start/End pair may or may not be hidden from the user, but either way there must be a means of performing retrievals based on those attributes, and there must also be a means of updating them.
The conclusion of much of this research is that when one comes to implement the theory then certain, as yet unavailable, constructs are necessary such as the use of assertions, complex check constraints, multiple select statements in triggers, extensions to the SQL-92, and creation of new SQL3 standards.
We are aware of U.S. Pat. No. 6,233,573 that relates to dimensional databases having a temporal dimension. This has a discussion of the prior art and problems that complements that given above. We hereby incorporate by reference from column 1, line 24 to column 4, line 2, of that patent, and the reader is directed to read that discussion of the prior art, and indeed the whole patent U.S. Pat. No. 6,233,573.
Whilst U.S. Pat. No. 6,233,573 addresses similar general problems to those of the present invention, it is written in the context of providing a temporal extension to existing dimensional database theory such that one can accurately select records from a dimensional database that depend on a time of a change or effective date of a value of an attribute in a table of a dimensional database.
We are also aware of U.S. Pat. No. 5,440,730. This was filed over 10 years ago, but has a useful discussion of the prior art, and addresses allied problems to the present invention. We hereby incorporate by reference from column 1, line 7 to column 2, line 40 of U.S. Pat. No. 5,440,730 and direct the reader to read the whole of that patent now.
Whilst U.S. Pat. No. 5,440,730 recognises difficulties in the current approach to how time is handled in relational database, it is mainly concerned with the provision of an efficient indexing scheme for supporting high-level temporal operators especially when the search involves time intervals i.e. rather than time points. For example, where it is desired to retrieve object versions that are valid during a given time period, e.g. the names of all employees who worked for the company during 1985, this time index will lead directly to the desired versions, i.e. the names, without requiring the search of a version index for each individual object, i.e. employee, separately.
A further fact associated with existing relational databases with a time dimension is that access to the database is session-based: only one person has access at a time: only one person can change things at a time. Application and session cookies are often used to control who has access to what and when. As the number of people who want to access the database grows there may be times when getting control of the database to access it becomes slow. This may be especially so in, say, internet accessed databases.
A further fact is that if a database manager is in the process of changing a large number of attributes (e.g. the price of 10,000 stock items), users may need to be locked out of the system for a long time, for example tens of minutes, or even for hours. Typically such revisions to the entries in a database are performed in the middle of the night so as to minimise disruption to users. However, with Internet access to a web site spreading the usage of a web site database over many time zones it can be difficult to find a time when few people are using the database.
U.S. Pat. No. 6,363,387 discloses a system for enhancing concurrency of a database using row level locking. It discloses prior art, and we hereby incorporate by reference the prior art discussion of its “background of the invention”. Section of that patent, from the first line of page 1 up until the start of the “screening of the invention”.