This invention relates generally to a computing system for allocating resources in a multiple resource arrangement and, in particular, to a system for allocating resources in a data storage system.
A computing system may be conceptually divided into two general classes of subsystems: (1) the operational subsystems that use system resources to provide the functions the computing system is designed to perform; and (2) the fault management subsystem or subsystems that find, analyze, diagnose and, in some instances, take action to minimize the effect of malfunctioning resources on the overall system. The term "resources" is used herein to describe functional modules within the computing system that may be tested separately or used separately for, e.g., storing, transmitting or manipulating data within the system. Resources are hardware elements and include disk drives, processors, memories, data paths, etc. The operational subsystems and the fault management subsystems compete for access to the system resources.
In prior art computing systems, built-in diagnostic routines in the fault management subsystem may gain exclusive access to a resource for relatively long periods of time in order to test the resource. While the diagnostic routine is running on a particular resource, an operational subsystem may request access to that resource in order to perform a resource operation and to provide services to the computing system. In most computing systems, if the operational subsystem receives no reply to its request within a predetermined period of time, the service routine being performed by the operational subsystem may abort, thereby disrupting service to the computing system. What is needed, therefore, is a computing system that manages access to the resources so that the operation of the fault management subsystems do not cause any interruption of the services provided by the operational subsystems.
In conventional computing systems, the system may become partially inoperable if a resource (such as a data storage module or a data path) fails. What is needed, therefore, is a computing system that can compensate for the failure of a resource to keep the system fully operational.
A significant problem with some prior art storage systems is their use of CPU time on the external computer or computers to identify and diagnose inoperative or malfunctioning storage devices within the storage system and to reroute data paths when one or more data storage devices or other peripherals become inoperable. The use of the external computer to perform these low level functions could interfere with more important CPU tasks.