1. Field of the Invention
This invention relates to data storage systems, and more particularly to data storage systems having a storage device controller interposed between a host computer and one or more data storage devices wherein the controller manages the storage of data within the one or more storage devices.
2. Description of the Related Art
Auxiliary storage devices such as magnetic or optical disk arrays are usually preferred for high-volume data storage. Many modern computer applications, such as high resolution video or graphic displays involving on-demand video servers, may heavily depend on the capacity of the host computer to perform in a data-intensive environment. In other words, necessity for external storage of data in relatively slower auxiliary data storage devices demands that the host computer system accomplish requisite data transfers at a rate that does not severely restrict the utility of the application that necessitated high-volume data transfers. Due to the speed differential between a host processor and an external storage device, a storage controller is almost invariably employed to manage data transfers to/from the host and from/to the storage device.
The purpose of a storage controller is to manage the storage for the host processor, leaving the higher speed host processor to perform other tasks during the time the storage controller accomplishes the requested data transfer to/from the external storage. The host generally performs simple data operations such as data reads and data writes. It is the duty of the storage controller to manage storage redundancy, hardware failure recovery, and volume organization for the data in the auxiliary storage. RAID (Redundant Array of Independent Disks) algorithms are often used to manage data storage among a number of disk drives.
FIG. 1 is a diagram of a conventional computer system 10 including a host computer 12 coupled to a storage controller 14 by a link 16 and two storage devices 18a-b coupled to storage controller 14 by respective links 20a-b. Each storage device 18 may be, for example, a disk drive array or a tape drive. Links 16 and 20a-b may include suitable interfaces for I/O data transfers (e.g., Fibre Channel, small computer system interface or SCSI, etc.) As is evident from FIG. 1, all of the information involved in data transfers between host computer 12 and storage devices 16a-b passes through storage controller 14. Storage controller 14 receives command, status, and data packets during the data transfer.
FIG. 2 is a diagram illustrating an exemplary flow of packets during a data read operation initiated by host computer 12 of FIG. 1. Links 16 and 20a-b in FIG. 1 may be Fibre Channel links, and the data transfer protocol evident in FIG. 2 may be the Fibre Channel protocol. Referring now to FIGS. 1 and 2 together, host computer 12 issues a read command packet identifying storage controller 14 as its destination (XID=H, A) via link 16. Storage controller 14 receives the read command and determines that two separate read operations are required to obtain the requested data; one from storage device 18a and the other from storage device 18b. 
Storage controller 14 translates the read command from host computer 12 into two separate read commands, one read command for storage device 18a and the other read command for storage device 18b. Storage controller 14 transmits a first read command packet identifying storage device 18a as its destination (XID=A, B) via link 20a, and a second read command packet identifying storage device 18b as its destination (XID=A, C) via link 20b. Each read command packet instructs respective storage devices 18a-b to access and provide data identified by the read command. Storage device 18a (ID=B) accesses the requested data and transmits a data packet followed by a status packet (XID=B, A) to storage controller 14 via link 20a. Storage device 18b (ID=C) accesses the requested data and transmits a data packet followed by a status packet (XID=C, A) to storage controller 14 via link 20b. Each status packet may indicate whether the corresponding read operation was successful, i.e. whether the data read was valid.
As indicated in FIG. 2, storage controller 14 temporarily stores the data and status at packets in a memory unit within storage controller 14. Storage controller 14 then consolidates the data received from storage devices 18a-b and processes the status packets received from storage devices 18a-b to form a composite status. Storage controller 14 transmits the consolidated data followed by the composite status (XID=A, H) to host computer 12 via link 16, completing the read operation. In the event that the composite status indicates a read operation error, host computer 12 may ignore the consolidated data and initiate a new read operation. In general, the flow of packets depicted in FIG. 2 is typical of a two-party point-to-point interface protocol (e.g., the Fibre Channel protocol).
A typical storage controller includes multiple ports and one or more CPUs coupled to a communication bus, and a memory bus coupling the one or more CPUs to a memory unit. Two parameters are commonly used to measure the performance of a storage system: (1) the number of input/output (I/O) operations per second (iops), and (2) the data transfer rate of the storage system. Generally, the rate of execution of iops by a storage controller is governed by the type, speed and number of CPUs within the storage controller. The data transfer rate depends on the data transfer bandwidth of the storage controller. In computer system 10 described above, all of the data transferred between host computer 12 and storage devices 18a-b is temporarily stored within the memory unit of storage controller 14, and thus travels through the memory bus of storage controller 14. As a result, the data transfer bandwidth of storage controller 14 is largely dependent upon the bandwidth of the memory bus of storage controller 14.
Current storage systems have restricted scalability because of the storage controllers having a relatively inflexible ratio of CPU to bandwidth capability. In other words, as evident in FIGS. 1 and 2, the data transfer rate between host computer 12 and storage devices 18a-b is dependent upon control functions (i.e., command translation and status processing) performed by storage controller 14. This interdependence between iops and data transfer rate results in less efficient scalability of performance parameters. For example, in conventional storage controller architectures, an increase in data transfer rate may require both an increase in data transfer bandwidth and an increase in the number of CPUs residing within the controller.
It would thus be desirable to have a storage controller where control functionality (as measured by the iops parameter) is scalable independently of the data transfer bandwidth (which determines the data transfer rate), and vice versa. It may be further desirable to achieve independence in scalability without necessitating a change in the existing interface protocol managing the host-controller-storage interface.
Several embodiments of a computer system are described which achieve separation of control and data paths during data transfer operations, thus allowing independent scalability of storage system performance factors (e.g., storage system iops and data transfer rate). In one embodiment, the computer system includes a data switch coupled between a host computer and one or more storage devices. A storage controller for managing the storage of data within the one or more storage devices is coupled to the switch. The switch includes a memory for storing data routing information generated by the controller, and uses the data routing information to route data directly between the host computer and the one or more storage devices such that the data does not pass through the storage controller. Within the computer system, information may be conveyed between the host computer, the switch, the one or more storage devices, and the storage controller according to a two party protocol such as the Fibre Channel protocol. The computer system achieves separation of control and data paths using a modified switch and standard host adapter hardware and host driver software. In addition, a two party protocol such as the Fibre Channel protocol is not violated.
The one or more storage devices, the storage controller, and the switch make up a storage system of the computer system. The switch receives a data transfer command from the host computer and directs the data transfer command to the storage controller. In response to the data transfer command, the storage controller translates the data transfer command into one or more translated data transfer commands, and also generates frame header substitution information. The storage controller transmits the one or more translated data transfer commands and the frame header substitution information to the switch.
The switch routes the one or more translated data transfer commands to appropriate storage device and stores the frame header substitution information within the memory. The switch replaces header information of one or more data frames associated with the data transfer operation with the substitute header information such that the data frames are routed directly between the host computer and the storage device and do not pass through the storage controller.
Each data frame includes header information within a header field, and the header information includes a destination address. The switch routes a given data frame based upon the destination address. The frame header substitution information includes a substitute destination address generated by the storage controller such that when the switch replaces header information of the data frames with the substitute header information, the data frames are routed directly between the host computer and the storage device and do not pass through the storage controller.
When the data transfer command from the host computer is a read command, the substitute destination address is the address of the host computer. The switch receives the one or more data frames associated with the read operation from the one or more storage device, and routes the one or more data frames directly to the host computer such that the data frames do not pass through the storage controller.
When the data transfer command from the host computer is a write command, the substitute destination address is the address of the one or more storage devices. The switch receives the one or more data frames associated with the write operation from the host computer, and routes the data frames directly to the one or more storage devices such that the data frames do not pass through the storage controller.
The frame header substitution information may include target header information and corresponding substitute header information. Upon receiving a data frame, the switch may compare the header information of the data frame to the target header information stored within the memory. If the header information of the data frame matches the target header information, the switch may replace the header information of the data frame with the substitute header information corresponding to the target header information. Following replacement of the header information of the data frame with the substitute header information, the switch may calculate a cyclic redundancy check (CRC) value for the data frame and insert the CRC value into the data frame. The substitute header information may include the substitute destination address as described above. The switch may then route the data frame dependent upon the substitute destination address. As a result, the data frame may move directly between the host computer and the storage device such that the data frame does not pass through the storage controller.
Following a data transfer operation, the switch may receive status information associated with the data transfer operation from the one or more storage devices. The switch may route the status information to the storage controller. In response, the storage device may generate an overall status which may be a consolidation of separate status information from multiple storage devices. The storage controller may transmit the overall status to the switch, and the switch may route the overall status to the host computer.
The one or more storage devices may include multiple disk drives, and the storage controller may manage the one or more storage devices as a RAID (Redundant Array of Independent Disks) array. Accordingly, the storage controller may generate the translated data transfer commands dependent upon the RAID array configuration of the one or more storage devices.
One embodiment of the data switch is a crossbar switch including multiple input and output ports coupled to an array of switching elements. Each input port is adapted for coupling to a transmission medium and receives information via the transmission medium. Each output port is adapted for coupling to a transmission medium and configured to transmit information via the transmission medium. The array of switching elements selectively couples the input ports to the output ports. A switch matrix control unit receives routing information from the input ports and controls the array of switching elements dependent upon the routing information. Each input port includes a memory unit for storing the frame header substitution information. Each input port receives frame header substitution information and stores the frame header substitution information within the memory unit.
During a data transfer operation, one or more of the input ports receives a data frame including header as described above. Each input port receiving a data frame replaces the header information of the data frame with the substitute header information stored within the memory unit. As a result, the substitute destination address becomes the destination address, and the input port provides the destination address to the switch matrix control unit as the routing information.
Each input port may include a port control unit configured to control the input port and an input queue for storing received information, wherein the port control unit is coupled to the memory unit and to the input queue. When the data frame is received, the data frame is stored within the input queue. The port control unit may compare the header information of the data frame to the target header information stored within the memory. If the header information of the data frame matches the target header information, the port control unit may replace the header information of the data frame with the substitute header information corresponding to the target header information. After the port control unit replaces the header information of the data frame with the substitute header information, the port control unit may calculate a CRC value for the data frame and inserts the CRC value into the data frame. The switch matrix control unit couples the input port to an output port via the array of switching elements dependent upon the substitute destination address. As a result, the data frame may move directly between the host computer and the storage device such that the data frame does not pass through the storage controller.