A database management system concurrently executes a large number of transactions submitted from multiple users through a network. However, if multiple transactions are executed on the same data items concurrently, data inconsistency may occur. Therefore, in order to avoid any data consistency to occur, the database management system uses concurrency control to execute multiple transactions.
<Serializable Schedule>
In order to execute multiple transactions without causing any data inconsistency, the easiest method is to execute each transaction one by one. This method is called serial scheduling, in which execution of a subsequent transaction cannot be started until the execution of a current transaction is completed.
The problem of serial scheduling is that it limits concurrency or interleaving of operations. In a serial schedule, if a transaction waits for an I/O operation to complete, the CPU processor cannot be switched to another transaction, thus wasting valuable CPU processing time and making serial schedule generally unacceptable.
Thus, in order to execute multiple transactions concurrently without causing any data inconsistency, serializable scheduling is used.
In order to achieve a serializable schedule, a locking method is used for data items to be processed by multiple transactions. There are several types of locks such as shared lock and exclusive lock. A shared lock is a read lock which is used by a transaction to only read a data item; thus if one transaction is using a shared lock to read a data item, other transactions can read the data item but cannot update or delete the data item until the shared lock is released from the data item. An exclusive lock is a write lock which is used by a transaction to read and write a data item; thus, if one transaction is using an exclusive lock to update a data item, other transactions cannot read, delete, or update the data item until the exclusive lock is released from the data item.
However, only using locks may cause a data inconsistency between multiple transactions depending on some locking orders. Therefore, if multiple transactions need to lock the same data items, a method of creating a dependency graph representing the locking order for the transactions is used. If there is a cycle in the dependency graph, this means that there is a data inconsistency between some of these transactions; therefore, all of the transactions in the dependency graph are aborted, and the aborted transactions are then restarted from the beginning. If there is no cycle in the dependency graph, all of the transactions are executed correctly without causing any data inconsistency.
However, each time a cycle appears in the dependency graph, transactions are aborted repeatedly and thus this method is inefficient. Moreover, the serializable schedule may not be available even if the transactions are restarted many times. Therefore, there are protocols that guarantees serializable schedule such as two-phase locking protocol, optimistic locking protocol, timestamp protocol, multi-version protocol, etc.
(1) Two-Phase Locking Protocol
The two-phase locking protocol divides lock operations into two phases: a growing phase, during which acquiring of new locks on data items is allowed but none can be released; and a shrinking phase, during which existing locks can be released but no new locks can be acquired; thus, acquiring a lock again after releasing a lock is not allowed.
Also during the growing phase, upgrading of locks is allowed in addition to acquiring locks, and during the shrinking phase, downgrading of the lock is allowed in addition to releasing locks.
In order to ensure a serializable schedule, all transactions in a schedule need to be executed with the two-phase locking protocol.
However, in the two-phase locking protocol, if multiple transactions hold the locks required by each other and any of their operations cannot be performed, a deadlock occurs. In order to remove the deadlock, one of transactions holding its locks needs to be aborted, rolled back, and resubmitted to the transaction processing system.
In addition, if other transactions read the value of the data item written by the aborted transaction, these transactions are also aborted, rolled back, and resubmitted to the transaction processing system. The effect of aborting a transaction causes another transactions to abort in a chain reaction is called cascading rollback.
In order to avoid cascading rollback, a strict two-phase locking protocol can be used. This method has an additional condition that a transaction does not release its first lock in the shrinking phase until the transaction is committed or aborted.
While this method prevents cascading rollback, the time of holding locks by each transaction becomes longer.
<Problem of Two-Phase Locking>
In the two-phase locking protocol, if the number of transactions to be executed concurrently increases, a lock contention may occur between multiple transactions, and thereby a deadlock can easily occur. As a result, many transactions are aborted and rolled back to remove the deadlock, thus causing transaction processing speed to decrease.
In the two-phase locking protocol, if multiple transactions are performed on the same data item, the same data item is locked by each transaction repeatedly so that while one transaction is holding a lock on the data item, other transactions must wait until that lock is released.
The two-phase locking protocol uses an operating system process to execute each transaction. However, if the operation system switches processes while one transaction is holding a lock on a data item, the other transaction requiring to process the data item cannot acquire a lock to the data item even if it has received processing time from a CPU (microprocessor); then, if the operating system switches the processes again, the other transaction is not able to perform any operation. Therefore, if the number of transactions to be executed concurrently increases, this situation occurs repeatedly, slowing down the transaction processing speed significantly.
(2) Optimistic Locking Protocol
The optimistic locking protocol executes each transaction by assuming that there is no contention in executing the transaction, and then checks for any contention when the transaction ends. If there is contention, it aborts and rollbacks the transaction. If there is no contention, it locks the data items to be written, performs the write operations, and commits the transaction. Since this method does not hold any locks while processing a transaction, the time for acquiring locks is only a short period just before committing the transaction.
<Problem of Optimistic Locking Protocol>
Similar to the problem of the two-phase locking protocol, there are problems in the optimistic locking protocol such as, if the number of transactions to be executed concurrently increases, the number of transactions that compete to update data items increases so that the number of transactions that abort and rollback also increases.
(3) Timestamp Protocol
In order to execute multiple transactions concurrently, the timestamp protocol assigns a timestamp (arrival time) to each transaction and executes the multiple transactions in the order from the transaction of smaller timestamp (early arrival time) to transaction of larger timestamp (late arrival time).
Each data item to be processed by a transaction is also given a read timestamp and a write timestamp; then the transactions are executed based on the following rules.
When a transaction performs a read operation on a data item, the timestamp of the transaction and the write timestamp of the data item are compared. If the timestamp of the transaction is smaller, the transaction is aborted because the value of the data item to be read by the transaction is already lost. Otherwise, the transaction is executed and the read timestamp of the data item is set to the timestamp of the transaction.
When a transaction performs a write operation on a data item, the timestamp of the transaction and the read timestamp of the data item are compared. If the timestamp of the transaction is smaller, the transaction is aborted because another transaction, which supposed to read the data item after being written by the first transaction, read that data item first.
Moreover, if the timestamp of the transaction is larger than the read timestamp of the data item and smaller than the write timestamp of the data item, the transaction is aborted because the transaction has missed the time to perform its write operation. Otherwise, the transaction is executed and the write timestamp of the data item is set to the timestamp of the transaction if the timestamp of the transaction is larger than the write timestamp of the data item.
The aborted transactions mentioned above are rolled back, resubmitted to the transaction processing system, and then given new timestamps.
Timestamp protocol can use the technique called Thomas's write rule to reduce the number of transactions to be aborted.
With this technique, when a transaction performs a write operation on a data item, if the timestamp of the transaction is greater than the read timestamp of the data item and smaller than the write timestamp of the data item, the transaction has missed the time to perform its write operation; however, if the database and other transactions are not affected, the transaction ignores the write operation and continues its execution.
<Problem of Timestamp Protocol>
Since the timestamp protocol does not use locks, a deadlock does not occur. However, if the number of transactions to be executed concurrently increases, the multiple transactions may not be executed for a data item in the right order, thus causing many transactions to be aborted, rolled back, and resubmitted to the transaction processing system. If such operation occurs repeatedly, a problem such as transactions never completing occurs.
(4) Multi-Version Technique
The multi-version technique is a method that keeps track of multiple values of a single data item by storing the previous value of the data item when a transaction writes the data item. Then, when multiple transactions are each executed on the same data item, the appropriate version of the data item is selected for each transaction to reduce the number of transactions to be aborted. There is a multi-version technique applying the two-phase locking protocol as well as the timestamp protocol.
<Problem of Multi-Version Technique>
The problem of the multi-version technique is that additional storage space is required for storing multiple values for each data item.
[Patent Document 1]
    Japanese Unexamined Patent Application Publication No. Hei 07-084864[Patent Document 2]    Japanese Unexamined Patent Application Publication No. Hei 07-200372[Patent Document 3]    Japanese Unexamined Patent Application Publication No. Hei 10-320258[Patent Document 4]    Japanese Unexamined Patent Application Publication No. 2002-32249[Patent Document 5]    U.S. Pat. No. 5,280,619 Jan. 18, 1994
System for accessing shared data using a serialization graph constructed from a history file showing completed locking dependencies between transactions
[Non-Patent Document 1]
    Hiroyuki Kitagawa, Database System, pp. 147-173, Textbook of Information System Series Volume 14, SHOKODO CO., LTD., First Print Published in Jul. 10, 1996[Non-Patent Document 2]    Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems Second Edition, pp. 527-575, Addison-Wesley Publishing Company, 1994[Non-Patent Document 3]    Kazuimi Kanehiro, Knowledge and Practice of Database Carefully Taught By an Expert, pp. 88-105, SHOEISHA. CO., LTD., 2007