1. Technical Field
The present invention relates to managing data storage.
2. Description of Related Art
Computer systems may include different resources used by one or more host processors. Resources and host processors in a computer system may be interconnected by one or more communication connections. These resources may include, for example, data storage devices such as those included in the data storage systems manufactured by EMC Corporation. These data storage systems may be coupled to one or more servers or host processors and provide storage services to each host processor. Multiple data storage systems from one or more different vendors may be connected and may provide common data storage for one or more host processors in a computer system.
A host processor may perform a variety of data processing tasks and operations using the data storage system. For example, a host processor may perform basic system I/O operations in connection with data requests, such as data read and write operations.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives, and disk interface units. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels to the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units. The logical disk units may or may not correspond to the actual disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data in the device. In order to facilitate sharing of the data on the device, additional software on the data storage systems may also be used.
In connection with a write operation, the data storage system may utilize a variety of different techniques such as write-back caching. With write-back caching, the data storage system may temporarily cache data received from a host within its storage cache and destage the cached data at different times onto the physical disk drives. The data storage system may utilize a backup or secondary power supply for use in connection with preventing loss of cached data in the event of a power failure. In the event of a power failure, the data storage system may utilize the backup power supply to provide power to the storage processor and physical data storage devices of the data storage system for a short period of time. During this period of time, the storage processor stores the data from its cache to a dedicated area on the storage devices that may be referred to as a “vault” so that the vault includes the cached data which has not yet been destaged onto the physical storage devices.
It should be noted that a data storage system may include multiple storage processors storing data to a same set of storage devices. Each of the storage processors may have its own cache so that cached data for the write operations, as well as possibly other cached data, may be mirrored in the caches of the storage processors. Multiple storage processors may be desirable for use in providing fault tolerance, higher throughput, and the like.
In a particular example, as is known in the art, large host computer systems require large capacity data storage systems. These large computer systems generally include data processors which perform many operations on data introduced to the computer system through peripherals including the data storage system. The results of these operations are output to peripherals, including the storage system.
In accordance with the example, one type of data storage system is a magnetic disk storage system. Here a bank of disk drives and the computer system are coupled together through an interface. The interface includes “front end” directors (or controllers) and “back end” disk directors (or controllers, also known as rear end directors or disk directors). The interface operates the directors in such a way that they are transparent to the computer. That is, data is stored in, and retrieved from, the bank of disk drives in such a way that the computer system merely thinks it is operating with one large memory. One such system is described in U.S. Pat. No. 5,206,939, entitled “System and Method for Disk Mapping and Data Retrieval”, inventors Moshe Yanai, Natan Vishlitzky, Bruno Alterescu and Daniel Castel, issued Apr. 27, 1993, and assigned to the same assignee as the present invention.
As described in such U.S. Patent, the interface may also include, in addition to the front-end directors and disk directors, an addressable global cache memory. The global cache memory is a semiconductor memory connected to all of the front end directors and back end directors and is provided to rapidly store data from the computer system before storage in the disk drives, and, on the other hand, store data from the disk drives prior to being sent to the computer. The cache memory being a semiconductor memory, as distinguished from a magnetic memory as in the case of the disk drives, is much faster than the disk drives in reading and writing data.
In operation, when the host computer wishes to store end-user (i.e., host computer) data at an address, the host computer issues a write request to one of the front-end directors to perform a write command. One of the front-end directors replies to the request and asks the host computer for the data. After the request has passed to the requesting one of the front-end directors, the director determines the size of the end-user data and reserves space in the cache memory to store the request. The front-end director then produces control signals for such front-end director. The host computer then transfers the data to the front-end director. The front-end director then advises the host computer that the transfer is complete. The front-end director looks up in a Table, not shown, stored in the cache memory to determine which one of the rear-end directors is to handle this request. The Table maps the host computer address into an address in the bank of disk drives. The front-end director then puts a notification in a “mail box” (not shown and stored in the cache memory) for the rear-end director which is to handle the request, the amount of the data and the disk address for the data. Other rear-end directors poll the cache memory when they are idle to check their “mail boxes”. If the polled “mail box” indicates a transfer is to be made, the rear-end director processes the request, addresses the disk drive in the bank, reads the data from the cache memory and writes it into the addresses of a disk drive in the bank. When end-user data previously stored in the bank of disk drives is to be read from the disk drive and returned to the host computer, the interface system operates in a reciprocal manner. The internal operation of the interface (e.g. “mail-box polling”, event flags, data structures, device tables, queues, etc.) is controlled by interface state data (sometimes referred to as metadata) which passes between the directors through the cache memory. Further, end-user data is transferred through the interface as a series of multi-word transfers, or bursts. Each word transfer in a multi-word transfer is here, for example, 64 bits. Here, an end-user data transfer is made up of, for example, 32 bursts. Each interface state data word is a single word having, for example, 64 bits.
In another example, a data storage system has a pair of storage processors connected to an array of disk drives. For example, such a system is disclosed in U.S. Pat. No. 5,922,077, which is hereby incorporated by reference herein, and which describes a dual data storage controller system in which the controllers are connected to one another by a peer-to-peer communication link. Each data storage controller is connected to a fibre channel loop in connection with each of the disk drives in the disk array. Fail-over switches provide each data storage controller with a means for connecting to either one of the fibre channel loops.
Each storage processor has its own write cache memory and the two storage processors may be configured to communicate with each other through a Cache Mirroring Interface (CMI) bus in the peer-to-peer communication link in order to maintain cache coherency as well as to minimize the impact of cache mirroring disk writes. In particular, the CMI bus enables a copy of data to be available on both storage processing units before the disk write operation is complete. In this system, a first storage processing unit has a first CMI interface circuit, a second storage processing unit has a second CMI interface circuit, and the first and second CMI interface circuits connect to each other through the CMI bus.
As is also known in the art, a disk drive contains at least one magnetic disk which rotates relative to a read/write head and which stores data nonvolatilely. Data to be stored on a magnetic disk is generally divided into a plurality of equal length data sectors. A typical data sector, for example, may contain 512 bytes of data. A disk drive is capable of performing a write operation and a read operation. During a write operation, the disk drive receives data from a host computer (e.g., here, a back end director) along with instructions to store the data to a specific location, or set of locations, on the magnetic disk. The disk drive then moves the read/write head to that location, or set of locations, and writes the received data. During a read operation, the disk drive receives instructions from a host computer to access data stored at a specific location, or set of locations, and to transfer that data to the host computer. The disk drive then moves the read/write head to that location, or set of locations, senses the data stored there, and transfers that data to the host.
The host computer, which for some purposes may include the storage system itself, may not address the disk drives of the storage system directly, but rather access to data may be provided to one or more host computers from what the host computers view as a plurality of logical devices or logical volumes (LVs), also referred to as LUNs. The LUNs may or may not correspond to the actual disk drives. For example, one or more LUNs may reside on a single physical disk drive. In another example, a LUN may use storage space from multiple physical disk drives. An LV or LUN (logical unit number) may be used to refer to the foregoing logically defined devices or volumes.
In the industry there have become defined several levels of RAID systems. The first level, RAID-0, combines two or more drives to create a larger virtual disk. In a dual drive RAID-0 system one disk contains the low numbered sectors or blocks and the other disk contains the high numbered sectors or blocks, forming one complete storage space. RAID-0 systems generally interleave the sectors of the virtual disk across the component drives, thereby improving the bandwidth of the combined virtual disk. Interleaving the data in that fashion is referred to as striping. RAID-0 systems provide no redundancy of data, so if a drive fails or data becomes corrupted, no recovery is possible short of backups made prior to the failure.
RAID-1 systems include one or more disks that provide redundancy of the virtual disk. One disk is required to contain the data of the virtual disk, as if it were the only disk of the array. One or more additional disks contain the same data as the first disk, providing a “mirror” of the data of the virtual disk. A RAID-1 system will contain at least two disks, the virtual disk being the size of the smallest of the component disks. A disadvantage of RAID-1 systems is that a write operation must be performed for each mirror disk, reducing the bandwidth of the overall array. In a dual drive RAID-1 system, the first disk and the second disk contain the same sectors or blocks, each disk holding exactly the same data.
RAID-2 systems provide for error correction through hamming codes. The component drives each contain a particular bit of a word, or an error correction bit of that word. RAID-2 systems automatically and transparently detect and correct single-bit defects, or single drive failures, while the array is running. Although RAID-2 systems improve the reliability of the array over other RAID types, they are less popular than some other systems due to the expense of the additional drives, and redundant onboard hardware error correction.
RAID-4 systems are similar to RAID-0 systems, in that data is striped over multiple drives. For example, the storage spaces of two disks are added together in interleaved fashion, while a third disk contains the parity of the first two disks. RAID-4 systems are unique in that they include an additional disk containing parity. For each byte of data at the same position on the striped drives, parity is computed over the bytes of all the drives and stored to the parity disk. The XOR operation is used to compute parity, providing a fast and symmetric operation that can regenerate the data of a single drive, given that the data of the remaining drives remains intact. RAID-3 systems are essentially RAID-4 systems with the data striped at byte boundaries, and for that reason RAID-3 systems are generally slower than RAID-4 systems in most applications. RAID-4 and RAID-3 systems therefore are useful to provide virtual disks with redundancy, and additionally to provide large virtual drives, both with only one additional disk drive for the parity information. They have the disadvantage that the data throughput is limited by the throughput of the drive containing the parity information, which must be accessed for every read and write operation to the array.
RAID-5 systems are similar to RAID-4 systems, with the difference that the parity information is striped over all the disks with the data. For example, first, second, and third disks may each contain data and parity in interleaved fashion. Distributing the parity data generally increases the throughput of the array as compared to a RAID-4 system. RAID-5 systems may continue to operate though one of the disks has failed. RAID-6 systems are like RAID-5 systems, except that dual parity is kept to provide for normal operation if up to the failure of two drives.
Combinations of RAID systems are also possible. For example, a four disk RAID 1+0 system provides a concatenated file system that is also redundant. The first and second disks are mirrored, as are the third and fourth disks. The combination of the mirrored sets forms a storage space that is twice the size of one individual drive, assuming that all four are of equal size. Many other combinations of RAID systems are possible.
In at least some cases, when a LUN is configured so that its data is written across multiple disk drives in the striping technique, the LUN is operating in RAID-0 mode. Alternatively, if the LUN's parity information is stored on one disk drive and its data is striped across multiple other disk drives, the LUN is operating in RAID-3 mode. If both data and parity information are striped across multiple disk drives, the LUN is operating in RAID-5 mode.
It is also common practice for a data storage system to include a hot spare disk drive. When a regular disk drive fails, the hot spare disk drive kicks in by taking over the role of the failing disk drive. For example, the storage control circuitry stores a copy of the data that currently exists on the failing disk drive onto the hot spare disk drive. The storage control circuitry then operates the hot spare disk drive in place of the failing disk drive. Typically, the failing disk drive is then removed from the data storage system, discarded by a technician, and may subsequently be replaced by another disk drive.