This invention relates generally to storage systems associated with computer systems and more particularly to providing a method and apparatus providing error information relevant to an entire storage system.
As it is known in the art, computer systems generally include a central processing unit, a memory subsystem and a storage subsystem. The storage subsystem associated with or in addition to a local computer system may include a large number of independent storage devices or disks housed in a single enclosure. This array of storage devices is typically connected to several computers (or hosts) via dedicated cabling or via a network. Such a model allows for the centralization of data which is to be shared among many users and also allows a single point of maintenance for the storage functions associated with the many computer systems.
One type of storage system known in the art is one which includes a number of disk storage devices configured as an array (sometimes referred to as RAID). Such a system may include several arrays of storage devices. In addition to the arrays of storage devices, typical storage systems include several types of controllers for controlling the various aspects of the data transfers associated with the storage system. One type of controller is a host controller and provides the interface between the host computers and the storage system. The host controller typically provides the interface between a host computer and the storage system. Another type of controller is a disk controller. There may be one or more disk controllers for each array of storage devices in a storage system. The function of a disk controller is to manage the transfer of data to and from its associated array drives.
In addition to the controllers described above, advanced storage systems, such as the SYMMETRIX.RTM. storage systems manufactured by EMC Corporation, may include a very large memory which is coupled to each of the controllers in the system. The memory may be used as a staging area (or cache) for the data transfers between the storage devices and the host computers and may provide a communications path between the various controllers. Such systems provide superior performance to non-cache storage systems. In addition basic functional blocks described above, a storage system will typically include other components such as an enclosure, power supplies, cooling fans, service processors, communications equipment, etc.
The storage systems described above may be cabable of servicing requests from different types of host computers, i.e. mainframe an open system computers. The communications path between the open system type computeres is one which typically adheres to the Small Computer System Interface (SCSI) communication protocol. That is, communications between the open system computers and the storage system occurs using a set of commands which are defined in the protocol. For example, in order for a host computer to read data from the storage system, it will typically send specific SCSI commands to its associated host controller within the storage system. The host controller interprets the commands and causes the appropriate disk controllers to retrieve the data from the corresponding disk devices. Information returned by the disk devices is also defined by the SCSI protocol. In addition to the commands for read and write operations, the SCSI protocol defines a means for the single disk devices to report error conditions. However, these error conditions are, in most cases, specifically related to the input/output (I/O) operation taking place when the error occurs.
In storage systems with large disk arrays, like the SYMMETRIX.RTM. storage systems described above, a need arises to be able to report error messages which are related to the entire system, or to report individual device errors which are not related to an immediately occurring I/O. Examples of these errors are, controller errors, power failures, cooling fan failures, communications errors, etc. These error reports need to be made available to any one of the host systems attached to the storage system even if the error does not effect the host seeking the information. The error reports would be useful to, for example, application programs running on the host computer which monitor the status of the storage system. The present SCSI protocol does not provide a command in its command set which allows a host computer to retrieve this type of error information.