The ability to access, store, and manage data has become a critical facet of today's economy. Likely not a minute (or second) goes by in which data is not manipulated electronically by an individual or organization. Virtually every electronic system available—from bank accounts to medical records and air traffic control—is dependent on data. As the volume of data handled increases, so does the need to provide data systems such as databases, key-value stores, file systems, data management systems, and data stores that manage data reliably and efficiently.
One way to provide data reliability is to process data in data transactions. A data transaction is a logical unit of operations performed on data that is treated in a coherent and reliable way independent of other transactions. The operations must be atomic, consistent, isolated and durable. A system of locks is typically used to provide these capabilities. A lock is a synchronization mechanism for governing access to a resource when there are multiple concurrent threads of execution. Users may only be permitted to modify data in transactions that hold a lock that gives users exclusive access to the locked data until the lock is released. There are many types of locks to choose from, including shared locks.
Traditionally, locks for a given transaction are released only after the transaction is committed, that is, only after all changes made to the transaction data are made permanent. A transaction is not considered committed until a commit log record is generated and written to stable storage. Writing the commit log record for a given transaction may be more time consuming than executing the transaction itself if the transaction does not incur a buffer fault. For example, if the underlying database system has enough memory that a given transaction does not incur a buffer fault, then flushing the commit record to stable storage typically takes at least an order of magnitude more time than transaction execution. If a transaction that performs 20,000 to 100,000 instructions acquires locks, e.g., key value locks in a B-tree index, right at the start of the transaction and holds them until the transaction is committed, the transaction may retain the locks for about 0.01 ms while it is executing and for about another 0.1 ms (or even 10 ms) during commit processing, i.e., after the transaction logic is complete. In systems with large memory and large buffer pools, short transactions may therefore complete in much less time than it takes to log their commit record on stable storage. The time it takes to log a commit record depends upon the type of stable storage used (e.g., disk, flash memory, memristor, etc.)
Given this inefficiency, an Early Lock Release (“ELR”) approach has been developed to allow a transaction to release its locks as soon as a commit record is allocated in a log buffer. That is, transaction locks may be released before the commit record is flushed into stable storage and before the transaction becomes durable. This ELR approach enables a dramatic reduction of lock contention and provides considerable performance improvements. However, it can also produce wrong results, e.g., incorrect data updates, if it fails to register and respect commit dependencies among participating transactions, and does not fully optimize distributed transactions (e.g., if multiple replicas are maintained). Improving transaction efficiency without the drawbacks of ELR therefore remains one of the key challenges in data processing today.