A traditional approach to high performance data storage in modern computer systems has been Direct Access Storage Devices ("DASD"), characterized by a single, large, expensive disk drive ("SLED") attached to a host by some standard interface bus. Included in the numerous drawbacks to this approach is a lack of redundancy in the system which means that data is not available when there is a problem with the drive or media. SLED drive data transfer times tend to be orders of magnitude slower than host system data transfer times. The host is moving data at high performance electronic speeds whereas drive data transfer speeds are limited by disk drive physical constraints such as rotational speed, seek time, etc. Thus, typically, DASD has been a primary factor limiting overall information system operating speeds.
Solutions to the relative slowness of SLED have been implemented to take advantage of the fact that very often data required from a drive will be located adjacent to the last data read, a phenomenon referred to as "locality of reference", and to take advantage of the fact that in some cases data required will have been accessed previously in a recent data access, a phenomenon known as "data reuse". Such solutions have involved configuring a fast memory buffer or cache between the host and the drive to yield significant improvements in system performance. When the host requests data, more data than has been requested will be fetched ("prefetching") and held in cache in anticipation that the next read will require adjacent data, in which case it can be retrieved very quickly from cache. Similarly on a write, once the host has transferred data to cache the transfer is complete as far as the host is concerned. The data can then be de-staged from cache to the drive at a convenient time.
Known implementations of cache technology in storage systems, include systems referred to as "Integrated Cached Disk Arrays," ("ICDAs"), which replaced the single large expensive disk drive (SLED) with an array of smaller inexpensive disk drives integrated into a single chassis. The high speed caches implemented in ICDAs yielded improved performance. One family of known ICDA products, known as SYMMETRIX produced by EMC Corporation, Hopkinton, Mass., provides a high reliability array of drives and offers great flexibility in terms of performance enhancements such as: mirroring; greater data availability; greater data transfer rates over distributed buses; and various levels of redundancy implemented in systems referred to as "RAID systems" ("Redundant Arrays of Inexpensive Disks").
The EMC.sup.2 Symmetrix architecture, illustrated in FIG. 1, integrates a high speed cache or global memory between a disk array and a host computer or CPU. The functional elements generally required to integrate the cache include a host-to-cache interface (which in one implementation is an IBM standard Bus and Tag interface referred to as a host "Channel Adapter"--CA), and a cache-to-disk drives interface (which is a Small Computer Systems Interface, "SCSI", referred to as a "Disk Adapter"--DA). The Symmetrix architecture operates under a "cache all" policy, meaning that all transfers, i.e. from the host to the drives or from the drives to the host, go through cache. The principal function of the hardware elements is the movement of data between the host and Global Memory (cache) or between Global Memory and the Disk Drives. The SYMMETRIX family of ICDAs are described in detail in the Symmetrix Product Manuals (for Models 5500, 52XX, 5100, 3500, 32XX and 3100) which are incorporated herein by reference.
The Global Memory Bus (GMB), between the CA and cache and between the DA and cache in Symmetrix, actually consists of two portions or identical buses designated "A" and "B". The use of two buses improves performance and eliminates a possible single point of failure. Plural sets of Channel Adapters and Disk Adapters, which are generically referred to as "Directors" or "Control Units" (CUs), are assigned to a particular bus based on a physical slot number in a system chassis. The number of available slots is a function of the system type within the family, however, all systems use the dual A and B Global Memory Buses with alternate slots on each bus. Even numbered slots are on the A bus portion of the GMB while odd numbered slots are on the B bus. Each system board on the GMB identifies its position in the chassis by reading a 5-bit slot code encoded on the backplane (referred to as the "SLOT ID"). Each bus has independent arbitration and consists of a 32 bit address bus plus 1 parity, a 64 bit data bus with 8 bits of Error Correction Code (ECC) check bits and a number of control lines with parity. The smallest data transfer that may take place over the GMB is 64 bits during each access (however, byte, word and longword operations are performed within a Director).
The Channel interface(s) and SCSI interface(s), i.e. the bus(es) between the host and CAs and between the disk array and CAs respectively, are 8 bit interfaces. Thus each byte received has to be assembled into a 64-bit memory word for transfer to Global memory, i.e. cache. Similarly, 64-bit memory words from Global memory have to be disassembled into bytes for transmission over the SCSI or Channel interfaces. This function is carried out by plural gate arrays located on each Director.
The Symmetrix system is designed around a pipelined architecture. A pipeline or pipe in this system is a registered path along which data is clocked to move it from one location to another. Channel Adapters, implemented on a dedicated board that includes two pipelines of its type, make use of pipeline hardware referred to as the "Channel pipe" which moves data between the host Channel and Global Memory. The Channel pipe stages, illustrated in FIG. 1B, include channel receive and transmit FIFOs that implement the Bus and Tag interface. A channel gate array stage functions to assemble/disassemble 64 bit memory words for byte transfers between the Channel interface and a Dual Port Ram (DPR), via Error Detection and Correction circuitry (EDAC is effected by a standard IDT49C465A).
Disk Adapters make use of pipeline hardware referred to as the "SCSI pipe" (two SCSI pipes per DA board), to move data between the Global Memory and SCSI Disk Drives. The SCSI pipe stages, illustrated in FIG. 1C, include a SCSI interface chip (NCR 53C94) that implements the standard bidirectional interface according to the SCSI protocol. A SCSI gate array stage functions to assemble/disassemble 64 bit memory words for byte transfers between the SCSI chip and Dual Port Ram (DPR), via Error Detection and Correction circuitry.
Each Director includes a Motorola 68030 microprocessor to control its onboard pipelines moving data between the host Channel and Global Memory (if it is a CA), or moving data between the Global Memory and the disk array (if it is a DA). The 68030 microprocessor also requires access to Global Memory, which it can not directly address. Two pipes are provided on each Director to provide facilities for the 68030 to access Global Memory. A Direct Single Access (DSA) pipe facilitates transfers of a single memory word at a time and is typically used for control/status type operations. A Direct Multiple Access (DMA) pipe can transfer from 1 to 8 memory words on each memory access and is thus more efficient for transferring larger blocks of data. The DMA pipe is typically used by the 68030 to transfer large amounts of data for testing/setting up Global Memory.
The principle stages of the DMA/DSA pipes are illustrated in FIG. 1D. A Memory Data Register (MDR) provides a 72 bit wide register set, 64 bit data and 8 bits parity, comprised of upper and lower words that can be independently read or written by the 32 bit 68030 microprocessor. The MDR performs the assembly and disassembly of 64 bit Global Memory words (which is performed by the Gate Arrays in the Channel and SCSI pipes). The MDR is also implemented to facilitate byte swapping for data value compatibility between the data values stored in Global Memory by the host and corresponding data processed according to Motorola 68030 byte conventions. The DMA/DSA pipes include the EDAC stage, the Dual Port RAM, and Global Memory.
The single 68030 processor on each Director is used to control the various pipes moving data to/from Global Memory through the pipe(s) on that Director. Other than the DSA pipe, the pipes with multiple memory word transfer capacity function in a substantially similar manner. Each pipe, except for the DSA pipe, is under the control of a 32 bit (write only) command register which is written by the 68030. Pipe commands are routed to and decoded by data transfer control programmable array logic (PAL), which receives command bits and outputs necessary control lines. Pipe status is maintained in a 32 bit (read only) status register which can be read by the 68030.
In order to keep component counts down and to maximize utilization of circuitry, some components are shared among the pipelines, as illustrated in FIG. 1E. Generally, each pipe has to arbitrate for use of the common hardware. There is independent arbitration for Global Memory, which is shared among all Directors. There is only one MDR on each director for implementing the DMA or DSA pipes for communication between the Global Memory and the 68030, however, the single MDR may be accessed by means of different addresses depending on the pipe being used, i.e. DMA or DSA, or depending on the type of memory operation being performed by the 68030. Each director has only one "flow through" IDT EDAC, and only one DPR.
Dual Port Ram is located on all Directors and is present in all of the data transfer pipes. The DPR serves as a buffer between the Director and the Global Memory. The dual ports to each RAM location facilitate access to each location by different "sides" of the system. One port of the DPR is accessed by a "lower" or "machine side" corresponding to the DPR side accessed by the host, disk array or 68030), and by an "upper" or "global memory side" corresponding to the side accessed by the Global Memory. There is independent arbitration on the "lower" side of the DPR and the "upper" side of the DPR.
Each pipe is allocated a unique block of DPR. Particular locations of DPR are mapped to particular banks/addresses of Global Memory. In using the DPR as a buffer to Global Memory, it is accessed by a system of pointers. An upper pointer points to the upper side and a lower pointer points to the lower side. There is a unique upper and lower pointer for each pipe on a Director, stored in corresponding upper and lower pointer register files. All pointer values are initially loaded by the 68030 executing a control program and Global Memory writes and reads are effected under control thereof. For transfers to Global Memory, i.e. Global Memory writes, memory words are assembled by the Gate Arrays or MDR and passed to the DPR at the location indicated by the lower pointer. The lower pointer increments until the appropriate sized memory block is written to the DPR (typically 8, sixty four bit memory words). The upper pointer remains unchanged. Transfers from the DPR to the Global Memory commence, typically upon completion of the transfer of the 8 memory words to DPR, whereupon the upper pointer is incremented for each word transfer from DPR to Global Memory. During the transfer to Global Memory, the Gate Array (or MDR depending upon the pipe accessing Global Memory) continues to fill the DPR at the lower/machine side using the lower pointer. For transfers from Global Memory, i.e. Global Memory reads, the processing is effectively the same under control of the 68030 and control program, however in reverse.
Each Global Memory board provides for two port access by the Directors via the two independent buses designated "A" and "B". Global Memory supports a burst transfer feature for all of the pipes whereby 1 to 8 memory words can be transferred sequentially to/from memory for all of the pipes in a single memory access, except for the DSA pipe which only transfers a single memory word at a time. The memory on each board is organized in banks of 8 to facilitate the burst mode transfers. The system operates most efficiently with a burst size of 8 and with a starting address aligned on an 8-word boundary, that is, when the starting address starts at bank 0 and continues up to include bank 7. Each word transfer is clocked by a burst clock signal generated on the memory board, however, the Director is responsible for calculating a final address for each burst transfer. At the end of each transfer, the Director makes a check between address bits and upper pointer bits, which have been incremented by the burst clock, to ensure integrity of the transfer.
As previously indicated, there is independent arbitration for the Global Memory. There are three possible users for the memory array on a Global Memory board: the "A" port or bus; the "B" port or bus; and refresh. Refresh has the highest priority while the A and B ports operate on a rotating priority. The A and B buses effectively combine on the memory board and therefore only one bus can be in use on any single memory board at any time. A Director will request a port to the A or B bus by driving the address and command lines, and asserting a select line for the port under control of the 68030 and control program. A port will be available for use: if it is not in refresh; if any other accesses to it are complete; if the port is not locked through the other port, or if the other port is not in use. If a port is successfully selected, it will return a bus grant signal to the Director selecting it, which will remain asserted until the transfer is complete.
The architecture of the Symmetrix ICDA generally facilitates greater data throughput in the data storage system. High reliability and high performance are achieved, among other things, via the dual A and B bus system that allows odd and even memory banks to be accessed with some overlap. Also, performance is enhanced through pipelining including buffering of memory words via the DPR so that memory words can be assembled/disassembled in the gate array pipeline stage and loaded to/from DPR substantially simultaneously with the transfer of memory words to/from Global Memory. However, there are limits on the performance that can be achieved with this ICDA architecture. Specifically, processing speeds are limited by the 33 MHz speed of the 68030 processor. Furthermore, data transfer rates are limited by latency related to servicing pipeline requests. Higher data rates, while difficult to achieve, continue to be desirable for on-line data storage systems.