Some databases distribute data or database processes or both among multiple nodes. Each node is a set of one or more processors with associated memory devices (“memory”). Such databases can enhance performance by moving blocks of data to be used by a processor on a local node from a relatively slower access medium, to a relatively faster access medium. The nature of the relatively faster and relatively slower access mediums may vary from implementation to implementation. For example, the relatively slower access medium may be a disk in a disk drive or volatile memory on a remote node, while the relatively faster access medium (generally referred to herein as the “cache,”) may be volatile memory on the local node. Alternatively, the relatively slower access medium may be a relatively slower disk drive, while the cache is simply a relatively faster disk drive. The techniques described herein are not limited to any particular forms of access media.
In shared disk systems, multiple nodes can access the same block of data on disk. Inconsistency of data for the same data block can occur in the local caches if one node reads a data block from disk into its local cache after another node has changed the same data block in its local cache. To prevent inconsistency of data in data blocks, a lock mechanism is employed. With a lock mechanism, no operation, such as a read or a write, of a data block may begin at a node until the node receives a lock for that data block for that operation from a lock manager. The lock manager does not grant locks for operating on a particular block to a node while another node has a lock for writing to that data block. Consequently, the lock for writing is often called an “exclusive” lock. When the writing node is finished, it releases its exclusive lock to the lock manager, which may then grant locks to other nodes for that block. The lock manager may grant locks for reading a particular block to a node while another node has a lock for reading that same data block, because the data will be consistent if multiple nodes simply read the data block. The lock for reading only, without writing, is often called a “shared” lock.
While suitable for many purposes, lock mechanisms have some disadvantages. One disadvantage is that conventional lock mechanisms impose a delay on a node that performs an operation on a data block that does not already reside in the node's cache. This delay increases the latency of the system—the time between the initiation of a set of one or more operations and the completion of the set. The node requests a lock for the data block, waits for a response from the lock manager that grants the requested lock, and then begins retrieving the data block from the disk or remote location.
Typically, the amount of time expended to retrieve a data block from disk or from a remote node is substantial, consuming hundreds to thousands of microseconds (10−6 seconds). In many systems, the amount of time to obtain a lock may also be substantial, consuming hundreds of microseconds. Thus, input and output (I/O) involving data block reads and writes for some distributed systems with lock mechanisms can significantly increase latency relative to distributed systems without lock mechanisms. In a database system limited by I/O throughput, the increased latency further limits the performance of the system. However, lock mechanisms are highly recommended in applications where data consistency is valued; so the extra latency is tolerated as the cost of data consistency.
Based on the foregoing description, there is a clear need for techniques to reduce the latency in obtaining data blocks that do not already reside in cache while providing data consistency in distributed, shared disk systems.
In general, there is a need for techniques to reduce latency in obtaining any resource that does not already reside in cache while providing for consistency of the contents of the resource. The resource need not be a data block of a database, but may be any data of a system that is capable of changing the data. For example, the resource may be a web page to be presented by a web server in response to a first request, while a page editor is running that may change the contents of the page in response to a different request. In another example, the resource may be a portion of memory provided in response to one request for memory among multiple requests for memory.
The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not to be considered prior art merely due to their inclusion in this section.