Organizations increasingly rely on computers to retain not only current information but historical information as well at all times. For example, a business may need to know how many items are left on a store shelf, the latest price of a security or the value of a financial portfolio, at any given moment. But in addition, it might need to know the price of a security as of some time in the past, and how many shares of the security were sold at that time. Such information is typically stored and accessed using one or more databases, e.g. a current database and some log of activities against that database. However, a more convenient way of representing this information is frequently to retain it all as part of a transaction time database. A transaction time database retains all past information such that all past database states can be interrogated. Typically, transaction time databases associate a time with each data item stored to facilitate this. Thus, the server needs the capability of determining when data has been safely stored.
A key problem when implementing a transaction time database is how best to associate a timestamp with each version of data that is changed by a transaction. This has been pursued in the art. Lazy, after-commit timestamping seems to offer the best solution. However, this approach can incur high costs in execution time if not performed wisely. Moreover, the persistent information needed for the timestamping can grow very rapidly, and become expensive to access. To keep the persistent information at a reasonable size and readily accessed requires a form of garbage collection which deletes information about a transaction once all its updates have been timestamped.
Garbage collection is a software routine that reclaims space for the general memory pool by searching memory for areas of inactive data and instructions in order to delete or otherwise move the data out. De-allocating memory after a routine no longer needs it is a tedious task, and programmers often forget to do it or do not do it properly. Thus, performing garbage collection at low cost has been a serious challenge in the art.
One standard way of dealing with timestamping in commercial systems supporting only a limited form of versioning, called snapshot isolation, has been to mark versions of data items updated by each transaction with an identifier for the transaction (the transaction ID). When a snapshot read executes, it is given a list (called the commit list) of the transactions that have committed prior to the snapshot execution, and hence whose versions that snapshot read is entitled to see. Such an approach does not generalize to full transaction time support, because it is not possible to provide a commit list for transactions requesting access to an arbitrary past version of the database based on some real time.
A second way of dealing with this problem has been addressed in research prototypes and papers. It involves replacing transaction IDs with time based timestamps. There are several variants of this scheme. The one generally considered most attractive is late, after-commit timestamping. This permits the timestamp to be chosen late in the transaction's execution. The versions of data items updated by a transaction are originally “stamped” with the transaction ID of the updater. After commit, this transaction ID is replaced by the transaction time assigned to the transaction.
To make this timestamping possible, the mapping of transaction ID to timestamp must be made persistent. But that represents a serious cost. If the information is allowed to simply “pile up”, this table of persisted information will get very large. There is also a risk that storing and accessing the persisted information will be costly. Thus, it is desirable to garbage collect entries that are no longer needed. Prior solutions have proposed a form of reference counting called persistent reference counting, which can be expensive itself, as this timestamping process has normally been thought of as needing to be logged so that the timestamping is guaranteed to itself be durable (persistent). Thus, there is a substantial unmet need for a mechanism that facilitates more efficient timestamping in a transaction environment.