1. Technical Field:
The present invention relates in general to an improved data processing system and in particular to a fault tolerant data storage subsystem for utilization within a data processing system. Still more particularly, the present invention relates to a data storage subsystem which includes a multipath dynamically alterable hierarchical arrangement of storage device controllers.
2. Description of the Related Art:
A typical data processing system generally includes one or more memory units which are connected to a central processor unit either directly or indirectly through a control unit and a channel. The function of these memory units is to store data and programs which are utilized by the central processing unit in performing a given data processing task.
Various types of memory units are utilized in current data processing systems. The response time and capacities of these different memory types vary significantly, and in order to maximize system throughput the choice of a particular type of memory unit generally involves matching its response time to the requirement of the central processing unit and its capacity to the data storage needs of the data processing system. To minimize the impact on systems throughout the data processing system, which result from the utilization of slow access storage devices, many data processing systems employ a number of different types of memory units. Since access time and capacity also affect the cost of storage, a typical system may include a fast access small capacity directly accessible monolithic memory for data that is utilized frequently and a series of tape memory devices or disk memory devices which are connected to the system through respective control units for data which is utilized less frequently. The storage capacities of these later units are generally several orders of magnitude greater than the semiconductor memories utilized within data processing systems and hence, the storage cost per byte of data is considerably less.
Computer systems are currently being developed in which the amount of data to be manipulated by the system is immense. For example, data storage systems have been proposed that are capable of handling amounts of data on the order of exabytes, spread across hundreds of direct access storage devices (DASD). Individual files for such systems have been proposed to be as high as ten gigabytes. Such large storage systems should be very reliable since restoring an entire storage system after a fault could take many hours or days. In addition, such storage systems should sustain very high data transfer rates in order to permit efficient utilization of the data.
It should therefore be apparent that it would be desirable that no single point of failure should be permitted to cause a large data processing system to lose access to memory.
The prior art has suggested several techniques for solving this problem. The most straightforward approach provides the utilization of a duplicate set of storage devices or memory units which keep a duplicate of all data, While this solution solves the data reliability problem this solution involves duplicating the cost of storage and some impact on system performance occurs since any change to stored data requites writing two records. Also, an added requirement of keeping track of where the duplicate records are kept is needed, in the event the primary records are not available.
Alternately, some systems utilize so-called "Error Correction Codes" (ECC) bits which are appended to data records or groups of data records. Utilizing error correction code logic it is possible to correct a small amount of data that may be read erroneously utilizing these bits.
More recently, the utilization of large redundant arrays of inexpensive disks, commonly known as "RAID" have been proposed. Such arrays offer the opportunity to achieve high data reliability at a lower cost than conventional methods which are based upon complete duplication of data. Various configurations of disk arrays have been proposed utilizing RAID technology and such systems typically provide high data reliability and availability. High data reliability means that the expected time to data loss is very long and the high availability of such systems means that the time spent repairing systems and recovering lost data is a small fraction of total time. While these arrays of disks provide enhanced data reliability, such systems are still prone to data loss in the event an error or fault occurs which is of sufficient magnitude to interrupt the recovery of data.
One proposed approach for overcoming these problems is set forth in Idleman et al., U.S. Pat. No. 5,140,592. The method and apparatus set forth therein propose enhancing the reliability of disk array systems by utilizing a controller device which is split into a first level and second level controller. The first and second level controllers are interconnected in a manner such that a failure of the second level controller associated with a first level controller will result in a switching of a peer controller into the path utilized by the faulting second level controller. While this system does provide some security from controller faults, the necessity of providing specialized interconnected controllers in which the control function is divided between two different levels of storage device controllers creates an added complexity which is more difficult to implement and maintain.
It should thus be apparent that a need exists for a fault tolerant data storage subsystem which may be simply and efficiently implemented utilizing standard technology and which provides a high degree of reliability within the memory storage subsystem.