The present invention relates to computer storage networks and, more particularly, to storage processors that intercept and process storage commands and data sent between initiators and targets.
Computer workstations, application servers or other computers (collectively hereinafter referred to as “initiators”) frequently access data that is stored remotely from the initiators. In these cases, computer networks are used to connect the initiators to storage devices (such as disks) that store the data. For example, Information Systems (IS) departments frequently maintain “disk farms,” tape backup facilities and magnetic, optical and other storage devices (sometimes referred to as media) in one or more central locations and provide access to these storage devices via computer networks. This centralized storage enables data stored on the storage devices to be shared by many initiators scattered throughout an organization. Such an arrangement also enables the IS departments to store the data in secure, controlled environments on highly reliable (sometimes redundant) equipment, so the data remains available, even in case of a catastrophic failure of one or more of the storage devices. Centralized data storage also facilitates making frequent backup copies of the data and providing access to backed-up data, when necessary.
Specialized computers (variously referred to as file servers, storage servers, filers, etc., collectively hereinafter referred to as “storage appliances”) located in the centralized locations make the data on the storage devices available to the initiators. Software in the storage appliances and other software in the initiators cooperate to make the central storage devices appear to users and application programs as though the storage devices are locally connected to the initiators.
In addition, the storage appliances can perform services that are not visible to the initiators. For example, a storage appliance can redundantly store data on a set of storage devices, such as on a Redundant Array of Inexpensive (or Independent) Disks (RAID). Several levels of RAID are available, including disk mirroring, which is commonly referred to as RAID-1. The storage appliance provides an interface that appears to the initiators to be to an actual storage device. However, the apparent storage device does not exist. Instead, the storage appliance accepts input/output commands directed to the apparent storage device (referred to as a “virtual device”), and the storage appliance performs input/output operations on one or more actual storage devices to satisfy the commands to the virtual device.
For example, when the initiator writes data to a virtual device that is implemented as a RAID-1, the storage appliance writes the data to all the applicable members of the RAID-1. Once all the actual write operations complete successfully, the storage appliance returns a successful status indicator to the initiator. The initiator remains unaware that the data has been stored on multiple actual storage devices. When the initiator reads from the virtual device, the storage appliance reads data from the applicable member(s) of the RAID and sends the data to the initiator. If one member of the RAID fails, the storage appliance uses the remaining members of the RAID to continue satisfying read and write commands from the initiator until (optionally) the failed RAID member is replaced.
The Small Computer System Interface (SCSI) protocol defines electrical characteristics of a cable (a parallel SCSI bus) and a set of commands and responses that can be sent over the SCSI bus or over a serial interconnect, according to which a computer (sometimes referred to as a host) can control storage and other types of devices. Fibre Channel (FC) defines characteristics of high-performance optical and electrical cables and a multi-level control protocol that can be used to interconnect a pair of devices to each other in a point-to-point connection, a small number (up to 127) of devices in an “arbitrated loop” or a large number (up to 2**24) of devices in a “fabric.” Such a fabric is similar to a computer network and typically includes one or more switches, similar to switches found in computer networks.
In some cases, storage devices are connected to storage appliances via dedicated, high -performance computer networks, commonly referred to as Storage Area Networks (SANs). Fibre Channel technology is commonly used to interconnect storage devices and storage appliances in SANs.
More recently, a protocol known as iSCSI (Internet SCSI) has been used to exchange SCSI commands and data over packet-switched networks (typically Internet Protocol (IP) networks), such as local area networks (LANs) and the Internet. iSCSI Protocol Data Units (PDUs) are used to encapsulate and reliably deliver SCSI Command Descriptor Blocks (CDBs), data and status over a packet-switched network. These PDUs are exchanged between initiators and storage appliances or between initiators and iSCSI storage devices that are connected to the network. In storage parlance, an initiator communicates with a “target,” which performs the requested operation. For example, storage appliances and iSCSI storage devices that are connected to IP networks are referred to as targets.
As noted, a storage appliance is typically connected to one or more initiators via an IP network or an IP fabric. To facilitate this network connection, the storage appliance includes one or more network interfaces that handle transmission and reception of signals over a medium, such as a twisted pair of wires, and optical fiber or over a wireless link. If the storage appliance is also connected to a SAN, the storage appliance includes one or more network or port interfaces that are connected to the fabric of the SAN. The storage appliance also includes at least one processor to control operation of the storage appliance and memory to store a control program and operating parameters and to buffer data.
When data is to be written to a virtual storage device, and a target (typically a storage appliance) that implements the virtual storage device performs some storage service that involves more than one actual storage device, such as implementing a RAID, the storage appliance buffers the write data until all the write operations on all the applicable storage devices complete. However, if the storage devices that make up a virtual device (such as a RAID) are made by different manufacturers or have different operating characteristics, the maximum amount of data each actual storage device can accept at one time (the “maximum segment size”) and the maximum amount of data the storage device can accept in a series of segments (the “maximum burst size”) can vary from storage device to storage device. Other parameters can also vary among the storage devices.
Each storage device indicates to the storage appliance the maximum segment size and the maximum burst size it is able to handle. For example, a lightly loaded storage device might have more local buffer space available than a heavily loaded storage device. Thus, the lightly loaded device can indicate that it is able to handle larger segment and burst sizes than the heavily loaded storage device indicates. Storage devices made by disparate manufacturers can indicate disparate maximum segment and burst sizes, as well as other parameters, whose values are different among a set of storage devices, even under identical load conditions. To accommodate these various maximum segment and burst sizes, the storage appliance receives and buffers all the data of the write operation from the initiator before beginning to write to any of the actual storage devices. After all the data is received, the storage appliance performs various-sized write operations to each of the actual storage devices, according to their respective maximum segment sizes, burst sizes, etc.
Buffering this data involves copying the data, as it is received by the storage appliance, from the network interface into the memory of the storage appliance. Writing each segment of data to an actual storage device involves copying some of the data out of the memory to a network or fabric interface. Copying the data into, then out of, the memory takes time, thus copying the data increases the amount of time before the storage appliance can report completion to the initiator (such as in a synchronous mirroring RAID-1 system), and decreases the apparent speed of the virtual storage device, from the viewpoint of the initiator.
Furthermore, the storage appliance must include enough memory to buffer all the simultaneous write operations that it performs on behalf of the initiators. Storage appliances typically handle many concurrent write operations for many initiators. Providing memory to buffer these write operations is expensive. If sufficient memory is not available in the storage appliance to handle a given write operation to a virtual device, the initiator must wait until a previously requested write operation completes and sufficient memory becomes available, thereby delaying the initiator.
Thus, prior art storage appliances require large amounts of memory to buffer data, and I/O operations performed through these storage appliances exhibit high latency times due to the large amount of data that must be moved into and out of the memory.