1. Technical Field
The present invention relates to computer storage controllers in general and in particular to a method for providing shared storage to multiple storage controllers. Still more particularly, the present invention provides multiple storage controllers to participate in advanced storage functions where the ownership and locking at a metadata level can be distributed among any one of the storage controllers.
2. Description of the Related Art
In the field of computer storage systems, there is an increasing demand for the so-called advanced functions. Advanced functions go beyond the simple input/output (I/O) functions of conventional storage systems. Advanced functions depend on the control of the metadata used to retain state information related to the real or user data stored in a computer system. The manipulations available using advanced functions enable various actions to be applied quickly to virtual images of data, while leaving the real data available for use by user applications. One example of such an advanced function is Flash Copy.
The Flash Copy function is sometimes known as the Point-In-Time copy or T0-copy. At the highest level, Flash Copy is a function where a second copy of some data is made available. The second copy's contents are initially identical to that of the first copy. The second copy is made available instantly, which means, in practical terms, that the second copy is made available in much less time than would be required to create a true, separate, physical copy, and that the second copy can be established without any unacceptable disruption to the operations of an application that is using the first copy.
Once established, the second copy can be used for a number of purposes, including backups, system trials and data mining. The first copy continues to be used for its original purpose by its associated application. If backup were to be performed without Flash Copy, the application must be shutdown before the backup can be taken, and then the application will be restarted again. Since it is becoming increasingly difficult to find a time window during which an application is sufficiently idle to be shut down, therefore the cost of performing a backup is relatively high. As such, there is a significant value in the ability of Flash Copy to allow backups to be taken without stopping any on-going application.
Flash Copy achieves the illusion of the existence of a second copy by redirecting read I/O addressed to the second copy (henceforth referred to as “Target”) to the original copy (henceforth referred to as “Source”), unless that region has been subject to a write. When a region has been the subject of a write (to either Source or Target), then in order to maintain the illusion that both Source and Target own their own copy of the data, a process is invoked to suspend the operation of the write command, and without it having taken effect, issues a read of the affected region from the Source, applies the read data to the Target with a write, then (and only if all steps were successful) releases the suspended write. Subsequent writes to the same region do not need to be suspended since the Target already has its own copy of the data. Such copy-on-write technique is well-known and is used in many environments.
There are many variations as to how Flash Copy can be implemented. These variations show through in the various features of an implementation. For example, some implementations allow reads and writes to the Target, while others only allow reads. Some implementations allow only limited update to the Target, and some require the Target to be the same size as the Source, while others permit it to be smaller.
However, all implementations rely on some data structure that governs the above decisions, namely, the decision as to whether reads received at the Target are issued to the Source or the Target, and the decision as to whether a write must be suspended to allow the copy-on-write to take place. The data structure essentially tracks the regions that have been copied from the Source to the Target, as distinct from those that have not.
Maintenance of such data structure (hereinafter called metadata) is the key to implementing the algorithm behind Flash Copy. Other advanced functions such as Remote Copy (also known as continuous copy or remote mirroring) or Virtualization rely on similar data structures. The metadata for each of those advanced functions differs, but in all cases, it is used to maintain state information, such as the location of data, the mapping of virtualized files to real storage objects, etc. The metadata is held in a persistent storage.
A function such as Flash Copy is relatively straightforward to implement within a single processor complex, as is often employed within modern storage controllers. With a little extra effort, it is possible to implement fault tolerant Flash Copy, such that two or more processor complexes can have accesses to a copy of the metadata. Thus, in the event of a failure of the first processor complex, the second processor complex can be used to continue operation, without loss of access to the Target Image.
However, the I/O capability of a single processor complex is limited. There is a finite limit as to the capability improvement of a single processor complex, which is measured in terms of either I/Os per second, or bandwidth (in megabyte/second) has a finite limit, and thus a constraint will be imposed on the performance of user application(s) eventually. Such limit arises in many implementations of Flash Copy, but a good example is in storage controllers. A typical storage controller has a single processor complex that dictates a limit on the performance capability of that storage controller.
Additional storage controllers can be installed. But the separate storage controllers do not share access to the metadata, and therefore do not cooperate in managing a Flash Copy image. The storage space becomes fragmented, with functions such as Flash Copy being confined to the scope of a single controller system. Both Source and Target disks must be managed within the same storage controller. A single storage controller disk space may become full, while another has some spare space, but it is not possible to separate the Source and Target disks, placing the Target disk under the control of the new controller. This is particularly unfortunate in the case of a new Flash Copy, where moving the Target is a cheap operation, as it has no physical data associated with it.
As well as constraining the total performance possible for a Source/Target pair, the constraint of single controller storage functions adds complexity to the administration of the storage environment. The administrative cost is often cited as the biggest cost in the total cost of ownership of storage. It would be significantly advantageous to reduce system cost and complexity by removing all possible arbitrary constraints.
A simple method of allowing multiple controllers to participate in a shared Flash Copy relationship is to assign one controller as the Owner of the metadata, and have the other controllers forward all read and write requests to that controller. The owning controller processes the requests as if they come directly from its own attached host servers, using the algorithm described above, and completes each request back to the originating controller. The
main drawback of such method is that the burden of forwarding each I/O request is too great, possibly doubling the total system-wide cost, and hence approximately halving the system performance.
It is known in the field of distributed parallel database processing to have a distributed lock management facility that enables resources to be assembled into lock clubs and to assign lock club owners that in turn control all locking for their assigned regions and issue locking control messages to I/O-requesting clients. Such a system is implemented at the logical resource level, and does not allow for control of locks among storage controller systems, nor does it provide any form of lock management at a metadata level. It also introduces considerable overhead in the case of storage virtualization, when real data segments may be held in widely distributed physical media.
It has been suggested in the academic literature, for example, in Scalable Concurrency Control and Recovery for Shared Storage Arrays” by Amiri et al, that it is possible to use distributed lock management at the device level in storage controller networks. However, it is known that such lock management techniques are inhibited by the burden of messaging that must take place among the storage controllers, leading to long latency periods and increased potential for deadlocks and repeated retries.
Consequently, it would be desirable to provide a low-cost, high-performance, scalable scheme that allows for multiple storage controllers to participate in advanced storage functions where the ownership and locking at a metadata level can be distributed among any one of the storage controllers.