This invention pertains generally to a computer structure and method that provides a plurality of controllers using operational primitives in a message passing multi-controller non-uniform workload environment, and more particularly to a RAID computer architecture employing this structure and method.
Modem computers require a large, fault-tolerant data storage system. One approach to meeting this need is to provide a redundant array of independent disks or RAID operated by a disk array controller. A conventional disk array controller consists of several individual disk controllers combined with a rack of drives to provide a fault-tolerant data storage system that is directly attached to a host computer. The host computer is then connected to a network of client computers to provide a large, fault-tolerant pool of storage accessible to all network clients. Typically, the disk array controller provides the brains of the data storage system, servicing all host requests, storing data to multiple (RAID) drives, caching data for fast access, and handling drive failures without interrupting host requests.
Traditionally, the storage pool is increased by adding additional independent racks of disk drives and disk array controllers, all of which require new communications channels to the host computer. One problem with this conventional configuration is that adding additional racks of disk drives to the network configuration typically requires a lot of intervention on the part of the system administrator. Also, because the disk array controllers are independent there is no provision for automatically distributing a workload across any of the available controllers, the burden of determining how to best attach and utilize the I/O processing resources falls upon the person responsible for setting up the system. Moreover, if the utilization of the I/O processors changes for any reason the system utilization may no longer be optimal.
An additional drawback of this conventional architecture, is that while adding more subsystems also adds more storage capacity to the system, it does not necessarily add additional processing capabilities. This is generally the case, because all controllers work independently with no cooperation amongst each other.
Some recent attempts to produce high-performance RAID systems having improved system utilization have used a single high-performance, monolithic controller. Because there is one controller, there is no possibility of an unbalanced workload between multiple independent controllers. Although the system utilization is improved, the cost of building the high-performance, monolithic controller dramatically increases the cost of the RAID system, which in the competitive computer memory market is highly undesirable. Another, more fundamental problem with this approach, as with all single controller systems, is that the failure of a single element, i.e., the controller, to renders the entire RAID system inoperable.
Dual active controllers were implemented to circumvent this problem of a single point of failure that all single controller RAID systems exhibit. Dual active controllers are connected to each other through a special communications channel as a means of detecting if the alternate controller malfunctions. The controller redundancy is provided by allowing a single controller in a system to fail and then having its workload picked up by the surviving controller. In order for one controller to take over the tasks which were being performed by the failed controller it must have information on what work was in progress on the controller which failed. To keep track of the work the partner controller is working on, messages are passed between the two controllers to mirror host writes and send configuration information back and forth. To fulfill these two requirements, two classes of controller to controller messages are required, data and configuration messages. The data messages can be considered to be static in that the information contained within the message is not generally processed until the partner controller fails. The configuration messages can be considered to be dynamic in that they are processed by the receiving controller immediately upon receipt and causes a change in the receiving controller""s state. Although, dual active controllers eliminate the problems caused by failure of a controller in earlier systems with multiple independent controllers or a single monolithic controller, they still suffer from one of the same drawbacks. Namely, that there is no provision which would allow the controllers to distribute the workload across the controllers, and therefore the system utilization is not optimal.
Therefore, there remains a need to overcome the above limitations in the existing art which is satisfied by the inventive structure and method described hereinafter. In particular, there is a need for a memory system comprising a plurality of disk array controllers in which a failure of one or more of the controllers does not render the system inoperative or any of the data stored in the system inaccessible. There is also a need for a memory system to which additional controllers or memory arrays can be added to increase processing capabilities. There is a further need for memory system having an architecture which does not require extensive alterations either to the software or the hardware to expand the system.
Heretofore, RAID system performance and fault tolerance have been limited by the use of one or two independent controllers. This invention provides structure and method for an efficient architecture allowing n-controllers to work together to improve computer and disk system performance and fault tolerance, when n is greater than two.
The invention provides a new type of RAID architecture using operational primitives in a message passing multi-controller environment to solve the problems presented in having multiple controllers distribute a non-uniform workload. In simple terms, the inventive technique breaks input/output (I/O) operations into a set of simple methods which can then be passed around as tokens, or pieces of work to be executed by whichever controller has the least amount of work to perform. The advantage of this type of architecture is that additional processing resources can be added to the system to address specific areas which need higher throughput without the need to rethink the software architecture.
The present invention is directed to a memory system for controlling a data storage system, the memory system comprising a plurality of memory controllers coupled by a communications path. The memory controllers are adapted to dynamically distribute tokens to be executed amongst the memory controllers via the communications path. The communication path is a high speed channel connected directly between the memory controllers, and can comprise one or more of a fibre channel, a small computer system interface, a mercury interconnect. Preferably, each of the memory controllers comprises a shared-memory controller and the communications path is coupled to the memory controller through the shared-memory controller. More preferably, the shared-memory controller comprises a computer readable medium with a computer program stored therein for dynamically distributing tokens amongst the memory controllers. The memory system of the present is particularly suited for use in a networked computer system comprising a server computer coupled to a plurality of client computers, and in which the data storage system comprises a plurality of disk drives in a RAID configuration.
In another aspect, the invention is directed to a computer program product for dynamically distributing tokens amongst a plurality of memory controllers. The memory controllers are adapted to control a data storage system, and to transfer data between the data storage system and at least one host computer in response to an instruction from the host computer. The computer program comprises (i) a dispatch unit for receiving at least one token which is ready to be executed from a host computer and storing the token in a token ready queue, (ii) an execution unit for taking a token from the token ready queue which the memory controller is qualified to perform, instructing the associated memory controller to perform the token, and transmitting a completion signal to other memory controllers, and (iii) an interprocessor message control unit for transmitting tokens, data, and completion signals between memory controllers. Preferably, each of the memory controllers comprise a computer readable medium with the computer program stored therein. In one embodiment, the computer program further comprises a token generation unit for parsing an instruction from the host computers into component procedures which are communicated as tokens. In yet another embodiment, at least one of the host computers comprise a computer readable medium having an instruction program stored therein, and the instruction program comprises a token generation unit for parsing an instruction from the host computer into component procedures which are communicated as tokens.
In yet another aspect, the present invention is directed to a method for operating a memory system comprising a plurality of memory controllers, the memory system adapted to transfer data between a data storage system and one or more host computers in response to instructions therefrom. In the method, the plurality of memory controllers are coupled with a communications path. An instruction from a host computer is parsed to identify at least one instruction component procedure. A token representing each instruction component procedure is broadcast to the memory controllers and stored in a token ready queue in each of the memory controllers. Preferably, the method comprises the further step of dynamically distributing the tokens amongst the memory controllers via the communications path to balance a workload on each of the memory controllers.