To improve scalability, some database and file systems permit more than one database or file server (each running separately) to concurrently access shared storage such as disk media. Each database or file server has a cache for caching shared data items, such as disk blocks. Such multi-node systems are referred to herein as clusters. One problem associated with a cluster is the overhead associated with obtaining a data item and the lock associated with this data item.
The entities that desire access to a data item are referred to herein as “Requestors” for the data item. The one or more entities that currently hold the rights to access the data item are referred to herein as the Holders of the data item. The entity that is responsible for keeping track of the locks that are associated with the data item, for all the nodes in a cluster, is referred to herein as the Master of the data item. The Master, Holder(s), and Requestor(s) of a data item may be separate processes on a single node, processes on separate nodes, or some may be processes on the same node with others on separate nodes.
In a typical scenario, a Holder holds the most recent version of a data item in its cache. The Requestor requests some level of access, and hence a lock, on the data item. The type of lock that a Requestor requires depends on the type of access the Requestor wishes to perform. Thus, lock requests typically specify the “lock mode” of the lock that is desired. Consequently, obtaining a particular type of lock may also be called “obtaining a lock in a particular mode”. For example, in order to read a data item, an S lock (e.g. share lock) must be obtained. In order to modify a data item, an X lock (e.g. exclusive lock) must be obtained. In order for an X lock to be held, no other Holders may hold any other locks. However, several Holders may hold S locks concurrently.
Various messages must be exchanged for a Requestor to obtain a data item and a lock associated with this data item. Referring to FIGS. 1A and 1B, FIG. 1A is a block diagram portraying a cluster where a Master 100, a Holder 110 and a Requestor 120 are on separate nodes. Furthermore the Requestor 120 needs an S lock and the Holder 110 already has an X lock. FIG. 1B shows a script of messages, which would be used by the scenario depicted in FIG. 1A. FIG. 1B also shows the parameters, which would be associated with these messages.
More than likely, the connection between the Holder 110 on Node A and the Requestor 120 on Node B is a high speed connection. The connection between the Requestor 120 on Node B and the Master 100 on Node C is a slower connection.
Initially, the Holder 110 has a data item and an X lock for this data item. Subsequently a Requestor 120 needs access to this data item and an S lock for it. In order to request access to the data item and to obtain an S lock for this data item, the Requestor 120 on Node B sends an lock request message to the Master 100 on Node C. Associated with the lock request message is a memory location into which the requested data item will ultimately be transferred and a desired lock mode, which indicates that the Requestor 120 needs an S lock.
When the Master 100 receives the lock request message, the Master 100 sends a message to the Holder 110 on Node A to inform the Holder 110 (e.g. inform lock holder) that there is a Requestor 120 that needs the data item in share mode.
The Holder 110 will transfer the requested data item to the Requestor's specified memory location. The Holder 110 performs a memory-to-memory transfer to transfer the data item to the Requestor 120. In addition, the Holder 110 on Node A will down-convert its lock from an X lock to an S lock and notify the Master 100 of this down conversion. The transfer of the requested data item (e.g. TBM) and the down-convert message maybe sent in parallel.
When the Master 100 receives the down-convert message, Master 100 grants the Requestor 120 on Node B an S lock by sending the Requestor 120 a lock grant message. Only after Requestor 120 receives the lock grant message may Requestor 120 access the data item.
In this scenario, latency, or time from initial request (e.g. LRM) to time when the data item can be used (e.g. Requestor 120 receives the lock grant message), is four small messages (e.g. lock request message, inform lock holder, down-convert message, and lock grant message). The traffic is four small messages (lock request message, inform lock holder, down-convert message, and lock grant message) and one transfer of the requested data item.
To increase the speed of operations in the cluster, it is desirable to provide techniques that reduce the amount of time that Requestors must wait before they can access the data items they request.