A transactional commit for large-scale storage systems, such as BigTable, use lock-based, distributed algorithms such as two-phase commit. Lock-based algorithms are inefficient when an unreleased lock of a failed transaction prevents other transactions from making progress. Commercial data storage systems often implement Snapshot Isolation (SI) since it allows for high concurrency between transactions. SI guarantees that all reads of a transaction are performed on a snapshot of the database that corresponds to a valid database state with no concurrent transaction. To implement SI, the database maintains multiple row versions of the data and the transactions observe different row versions of the data depending on the start time of the transaction.
In an SI-based system, two general approaches are used for detecting a conflict between two concurrent transactions that write into the same data element (e.g., row): 1) a lock-based approach, which locks modified rows to prevent concurrent transactions from modifying the locked rows, or 2) a lock-free approach with centralized Transaction Status Oracle (TSO) that monitors the commits of all transactions. In lock-based approaches, the locks of an incomplete transaction, executed on a failed client, may prevent other transactions from making progress during the recovery period.
It is, however, challenging to efficiently design a TSO that is not a bottleneck for system scalability and which guarantees the reliability of its data in presence of the failure of the node that hosts TSO. Large distributed storage systems, therefore, implement lock-based transactional commit algorithms, missing the benefits of lock-free approaches.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.