The growth in the deployment of large agglomerations of independent computers as computational clusters has given rise to the need for the individual computers in these clusters to access common pools of data. Individual computers in clusters need to be able to read and write data to shared storage devices and shared display devices. Because a cluster may be assembled from many thousands of individual computers, each of which generates data access requests on an independent basis, enabling shared access to a common data pool requires the deployment of a scheme that ensures that the data retrieved by some of the computers in the cluster is not corrupted by the incidence of data modification activity produced by other computers in the cluster.
In a typical clustered computer deployment, there is a shared storage medium, such as a single disk drive or a memory unit or a digital display device, with a so-called frame buffer design, connected via a data transport network 23 to a number of independent computers. The function of the computers is to process data that is held on the storage medium in some fashion, during the course of which activity, the data on the storage medium is being both read and written by the computers.
The computers that make up the cluster process the data on the shared medium asynchronously. There is no supervisory mechanism in place that has the effect of granting the individual computers the right to access the data on the storage medium in a fashion that ensures even the integrity of data retrieval.
Any transaction produced by one of the cluster computers is characterized by its occupancy in a time window that begins with the time the transaction is initiated by the computer, and spans the combined time periods required to transport the transaction to the storage medium, execute the transaction, and to initiate transport of the response to the transaction back to the computer. During this time span one or more of the other computers sharing the storage medium could have initiated a data modification transaction that is characterized by a time of initiation that is after the time of initiation of the original transaction but within its time span. Without intervention, the data on the storage medium could conceivably be modified during the time that it is being prepared from transmission to the original computer.
Other scenarios that have the potential for producing undesirable results from transactions produced in a clustered computer environment include the arrival at the storage medium of out of order transactions, a when a data retrieval transaction followed by a data update transaction for the same computer arrive in reverse order, or when a data update transaction is executed while multiple other computers are in the process of retrieving the same data element.
The traditional approach to addressing the problem of shared access to data element on a shared storage medium is to implement a scheme of locks that have the effect of serializing access to the data element by forcing the transaction initiators to wait until it gains exclusive access to a lock on the data element. The specific implementation of the locking mechanism is dependant on a variety of factors related to the nature of the computing application being used, the volatility of the data stored on the storage medium, and the scale of the computer cluster in use. Regardless of the specifics of the implementation, all of the schemes found in prior art have the effect of imposing on the transaction initiator the requirement to schedule its transactions in a manner that ensures atomically correct access to the data element in question.
FIG. 1 is an example of a typical scheme found in prior computer related art dealing with the issues of shared access to a storage or display medium. A centralized scheme of access management is implemented by using a meta-data controller (MDC) or a centralized lock manager. Computers in the cluster are required to first contact the MDC or lock service to gain authority to access the shared storage medium. When they gain the required authorization, the computers submit transactions to the shared storage medium controller. When their transactions are completed, they contact the MDC or lock manager again to release the access authority so that other computers can gain access to the storage device.
In prior computer related art, typical examples of the use of lock mechanisms to address the problems of multiple computer access to shared storage media include the introduction of centralized meta-data controller systems, the use of event driven input/output schedulers, the use of pseudo-channel semaphores, and the use of remote procedure call based callback mechanisms. A characteristic of all of these mechanisms is that all transactions are serialized through a single gateway path controlled through the lock mechanism. Use of this type of mechanism requires the transaction initiators to acquire and maintain knowledge of the state of the target data element, and consequently schedule its transactions based on shared management of that state.
A significant drawback to the design of schemes found in prior computer related art is that as the number of client computers increases, the load on the centralized MDC or lock manager increases to the level that access to the storage device eventually becomes degraded. The requirement that client computers contact the centralized access control service in order to schedule their own data access transaction inevitably limits the per client transaction rate to the capacity of the MDC or lock manager to handle access requests. Regardless of the efforts that may be made to increase the performance characteristics of the centralized service, rising numbers of clients will always overwhelm the capabilities of the MDC.
Therefore, there is a need for a system and method for scheduling transactions and managing shared access to a storage medium in order to ensure client centered consistency, without the use of a centralized locking system, therefore without imposing a limit to the scalability of the shared storage system nor to the number of concurrent clients that can access the data on the same shared storage system.