Online Transaction Processing (OLTP) refers to a class of computer database systems that facilitate and manage transaction-oriented applications. As used herein, a transaction refers to a database transaction that immediately processes user requests. In OLTP, records are updated frequently and the design emphasis is on being able to find a small number of records and update them quickly and efficiently. OLTP transactions typically read fifty or fewer records and insert and/or update a smaller number of records.
A data warehouse database is a repository of an organization's electronically stored data. A data warehouse database provides an architecture for the flow of data from an operational system (e.g., an OLTP system) to a decision support system. In some instances, for example, a parallel database, queries can read billions of records at a time. In such systems, data is typically inserted in very large batches (e.g., a million) and updated far less frequently than in OLTP workloads.
Database transactions have a collection of properties called ACID (Atomicity, Consistency, Isolation, Durability) that guarantee a set of distributed reads and writes are processed reliably. Atomicity is the guarantee that either all of the writes of a transaction are performed or none of them are. Consistency guarantees that only valid data will be written to the database. Isolation is the guarantee that other transactions cannot access or see the data in an intermediate state during a transaction. Durability is the guarantee that once the user has been notified of success, the writes of a transaction will survive system failure and not be undone.
Many databases rely upon locking to provide ACID capabilities, especially those optimized for OLTP. Locks are acquired before reads and writes of database data. However, data warehouse databases can have billions of records, so maintaining a large number of locks or escalating to locks that cover ranges of rows is complex and has substantial overhead. A large number of locks also hurts concurrency performance. An alternative to locking is Multi-Version Concurrency Control (abbreviated as MVCC), in which the database maintains separate copies of records that are modified. This allows users to read data without acquiring any locks and is ideal for data warehouse databases. MVCC is a relaxation of the isolation ACID guarantee. MVCC is sometimes referred to as snapshot isolation.
Since OLTP records are updated frequently, the MVCC information varies significantly from record to record. OLTP databases are much smaller than data warehouse databases. Consequently, the per-record MVCC overhead is an immaterial issue for OLTP. On the other hand, in a data warehouse database, the per-record MVCC overhead is problematic. In a data warehouse, almost all the records have the same transaction snapshot versioning (MVCC) information because the insert transactions write large numbers of records and updates are less frequent. The information is very redundant and may become a significant disk space overhead. This overhead also reduces the data read rate.
In view of the foregoing, it would be desirable to provide an improved technique for implementing snapshot isolation to optimize a data warehouse database.