The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Some database systems may create errors due to offering concurrent access to multiple readers and writers. For a simple example used to illustrate concurrent access principles but not intended to describe real world practices, when a wife and a husband both access their joint checking account through different computers at the same time, their concurrent transactions may create errors. The wife's mobile phone and the husband's tablet computer each concurrently read their joint account balance as $1,000. When the wife's mobile phone processes her request to transfer $50 from a savings account to the joint checking account, the wife's mobile phone adds the $50 deposit to the previously read $1,000 balance to result in a new balance of $1,050, which the wife's mobile phone writes to the bank's database. When the husband's tablet computer processes his request to pay $50 electronically to a creditor, the husband's tablet computer subtracts the $50 payment from the previously read $1,000 balance to result in a new balance of $950, which the husband's tablet computer writes to the bank's database. Although the equal deposit and withdrawal of $50 should have resulted in the same balance of $1,000 that preceded these two transactions, the use of stale data in the second transaction resulted in a database system error.
Some database systems address such problems through pessimistic concurrency control, which temporarily locks subsequent access to a data item when the data item is initially accessed. For example, when the wife's mobile phone reads the data items for the joint checking account, the database system locks these data items, such that the access request made only one half second later by the husband's tablet computer is denied the access to read the joint checking account information. While this result may produce frustration for only the husband in this simplified example, database administrators of database systems with thousands of users may want to avoid using a locking algorithm that prevents read access for many users who may only be requesting to read data items. Consequently, pessimistic concurrency control can deliver poor performance because locking can drastically limit effective concurrency.
Therefore, some database system administrators use optimistic concurrency control, which assumes that multiple transactions can frequently complete without interfering with each other. Transactions access data items without acquiring locks on those data items. Before committing a write to a data item, a transaction verifies that no other transaction has modified the data item that the transaction is about to overwrite. If this verification reveals potentially conflicting modifications, the committing transaction rolls back and can be restarted. This safeguard for the data item is at the write cycle for optimistic concurrency control, which is much later in the transaction process than locking the data item at the read cycle for pessimistic concurrency control. Optimistic concurrency control is generally used in environments with low data contention. When conflicts are rare, transactions can complete without the expense of managing access locks and without having transactions wait for other transactions' access locks to clear, leading to higher throughput than pessimistic concurrency control. However, if contention for data items is frequent, the cost of repeatedly restarting transactions hurts performance significantly; such that pessimistic concurrency control may have better performance under these conditions.
In database systems with multiple readers and writers, there may be occasions when it is necessary to write multiple data items in an atomic fashion so that readers and writers do not have inconsistent views of underlying data. For example, the wife's mobile phone may request to pay $100 electronically from the joint checking account to an electric bill account on June 10th, which requires access to the three data items for the joint checking account balance, payee, and payment date. Continuing this example, the husband's tablet computer may request to pay $50 electronically from the joint checking account to a credit card bill account on June 20th, which requires access to the same three data items for the joint checking account balance, payee, and payment date.
For this example, a database system needs to allow atomic actions for all three of the data items, such that all three of the write requests from the wife's mobile phone are committed together and all three of the write requests from the husband's tablet computer are committed together. Committing only one write request from one computer for one data item and committing write requests out of order may risk the writing of inconsistent data. For example, committing only one of the multiple write requests for a transaction may result in paying the electric bill amount to the credit card account or paying the electric bill account late on the due date for the credit card bill. Some NoSQL database systems, such as HBase, currently allow atomic actions for a single row, thereby limiting the complexity of theses NoSQL database systems or forcing users to accept inconsistent data. Database system such as HBase store large quantities of sparse data.