The need for fast access to massive amounts of shared data in today's networked computing environment has given rise to a data storage and retrieval technology called Storage Area Networks (SANs). Increasingly, SAN deployments depend on existing Transmission Control Protocol/Internet Protocol (TCP/IP) networks via an emerging standard Internet Small Computer Systems Interface (iSCSI) protocol. The Internet Engineering Task Force (IETF) Internet Protocol Storage Working Group has proposed a standard for iSCSI, which was submitted in June, 2002 as an Internet-Draft on the standards track of the IETF. The Internet-Draft, titled “iSCSI” by Julian Satran, et al., can be found at http://ietf.org/internet-drafts/draft-ietf-ips-iscsi-13.txt, and is incorporated herein by reference, and is herein referred to as the IETF iSCSI Internet-Draft. Dependence on existing TCP/IP networks creates a need to streamline communication of SCSI commands over TCP/IP networks, in order to achieve maximal performance levels.
In a traditional approach to data storage, called Direct Attached Storage (DAS), storage devices are linked to a server with a fixed, dedicated connection. Only one server can normally access data on a particular disk, via a local bus, commonly using a Small Computer Systems Interface (SCSI) protocol. The original SCSI protocol was standardized in 1986 by the American National Standards Institute (ANSI) as X3.131-1986. The current evolving SCSI standard is described in a document titled “SCSI Architecture Model—2 (SAM-2),” produced by T10, Technical Committee of the National Committee on Information Technology Standards, which may be found on the T10 Internet site at ftp://ftp.t10.org/t10/drafts/sam2, and which is incorporated herein by reference. DAS suffers from a number of limitations, for example, the SCSI protocol limits the length of a bus connecting to a device to about 6 meters. Additional limitations and drawbacks include upper limits on speed, and number of attached storage devices, limited scalability and reliability, and limitations of exclusive ownership of attached storage. These limitations are addressed by SANs.
FIG. 1 is a schematic block diagram depicting an architecture 10 for a Storage Area Network (SAN), as is known in the art. Architecture 10 comprises one or more users 20 operating one or more applications on one or more hardware platforms. The one or more applications generate requests for retrieval or storage of data. These requests are transferred via a LAN 26 to a server. In architecture 10, three different servers are present: a mail server 28, an application server 30, and a database server 32. Each server is designed to handle user requests of a specified type. For example, database server 32 is designed to handle requests to retrieve data from a centralized database. Each server communicates with a SAN 40 to satisfy its requests. Mail server 28, application server 30, and database server 32 generate commands to storage devices 42, 44, 46, and 48 in a format comprehensible to the devices, typically using the SCSI protocol. Storage devices 42, 44, 46, and 48 are also herein termed storage devices A, B, C, and D respectively. The commands are routed through SAN 40 via hubs and switches 34 and 36, which provide one or more connection ports for the storage devices. Storage devices A, B, C, and D comprise SCSI controllers which carry out the actions specified in the SCSI commands. A variety of storage devices and media are known in the art, including redundant arrays of independent disks (RAID), tapes, and optical storage arrays.
SANs handle communication between storage devices and storage clients. As noted above, the SCSI protocol acts as a common, standard interface to storage devices. Devices using the SCSI protocol include input/output (I/O) devices, hard drives, tape drives, CD and DVD drives, printers, and scanners. As well as defining hardware characteristics of an SCSI bus, the SCSI protocol specifies the formats and rules governing commands and responses communicated between storage devices, called “targets” in SCSI terminology, and storage clients, known as “initiators.”
In an article entitled “Overview and History of the SCSI Interface” by Charles M. Kozierok, published in the PC Guide which can be found at http://www.pcguide.com/ref/hdd/if/scsi/over-c.html, and which is incorporated herein by reference, the author emphasizes the general nature of the SCSI interface: “It's important to remember that SCSI is, at its heart, a system interface, as the name suggests. It was first developed for hard disks, is still used most for hard disks . . . For those reasons, SCSI is sometimes thought of as a hard disk interface . . . However, SCSI is not an interface tied specifically to hard disks. Any type of device can be present on the bus . . . ”
A SAN containing SCSI-based storage devices has at its core the task of SCSI transport: facilitating the transmission of SCSI commands and responses between targets and initiators. The first technology released for SCSI transport in the SAN environment was Fibre Channel, using special-purpose hardware, optimized for storage and other high-speed applications. The high costs associated with Fibre Channel installation, management, maintenance, and interoperability, together with the availability of Gigabit Ethernet and 10-Gigabit Ethernet, which are not limited to Fibre Channel fabrics, inter alia gave rise to the iSCSI protocol. Gigabit Ethernet and 10-Gigabit Ethernet provide data rates of one gigabit per second and 10 gigabits per second respectively, based on Ethernet frame formats and protocols, for example, the IEEE 802.3(Z) Ethernet protocol, issued by the Institute of Electrical and Electronics Engineers, Inc., N.J.
The iSCSI protocol is a transport protocol for SCSI commands over TCP networks. TCP is described by Postel in Request For Comments (RFC) 793 of the U.S. Defense Advanced Research Projects Agency (DARPA), entitled “Transmission Control Protocol: DARPA Internet Program Protocol Specification” (1981), which is incorporated herein by reference. The IETF iSCSI Internet-Draft document defines methods for encapsulating SCSI command descriptor blocks (CDBs) and responses into iSCSI messages, known as Protocol Data Units (PDUs), controlling flow, establishing iSCSI sessions, identifying PDUs in the TCP stream, mapping a session to multiple connections, and adding correction code on top of the TCP protocol, among other protocol elements.
A related, informational Internet-Draft by the IP Storage Working Group entitled “iSCSI Requirements and Design Considerations” by Marjorie Krueger, et al. can be found at http://ietf.org/internet-drafts/draft-ietf-ips-iscsi-reqmts-05.txt, and is incorporated herein by reference. Krueger, et al. describe the charter of the IP Storage Working Group as “developing comprehensive technology to transport block storage data over IP protocols . . . The initial version of the iSCSI protocol will define a mapping of SCSI transport protocol over TCP/IP so that SCSI storage controllers (principally disk and tape arrays and libraries) can be attached to IP networks, notably Gigabit Ethernet (GbE) and 10 Gigabit Ethernet (10 GbE).”
The benefits to SAN implementations based on iSCSI derive primarily from the large body of experience, knowledge, tools, and equipment that exist in the industry in both the fields of SCSI and TCP/IP. As Krueger, et al. go on to note, the IP Storage working Group “has chosen to focus the first version of the protocol to work with the existing SCSI architecture and commands, and the existing TCP/IP transport layer. Both these protocols are widely deployed and well understood. The thought is that using these mature protocols will entail a minimum of new invention, the most rapid possible adoption, and the greatest compatibility with Internet architecture, protocols, and equipment.”
The standard layered architectural model for communications between two users in a network is known as the International Standards Organization's Open Systems Interconnection (ISO/OSI) and is specified in standard ISO/IEC 7498-1:1994, “Open, Systems Interconnection—Basic Reference Model: The Basic Model.” An overview of the OSI reference model is provided in an article entitled “OSI,” which can be found at the Internet site http://searchnetworking.techtarget.com/sDefinition/0,,sid 7_gci212725,00.html, and which is incorporated herein by reference.
The OSI reference model (OSI-RM) is well known to those skilled in the art, and describes layers of functions which are comprised in network communications. A layer comprises one or more protocols which work together to provide a set of network functions, with each intermediate protocol layer using the layer below it to provide services to the layer above it. The hierarchical aggregation of these protocols is known as a protocol stack.
Reference is now made to FIG. 2, which is a schematic block diagram depicting a five-layer protocol stack 60 used in iSCSI, as is known in the art. Protocol stack 60 is a set of protocol layers required to transfer SCSI commands over a TCP/IP network. Each layer may be implemented in software or hardware, or a combination of both. A SCSI highest layer 62 comprises formulating and interpreting SCSI CDBs and responses 72, and is typically implemented in an operating system or SCSI controller. CDBs and responses 72 pass to an iSCSI layer 64 responsible for implementing the iSCSI protocol, and create iSCSI Protocol Data Units (PDUs) 73, by adding to the SCSI CDBs and responses. Additions may comprise headers and other information needed to facilitate transport in a network, e.g., a length of the iSCSI PDU.
Optionally, PDUs contain a header digest and data digest. A digest, as is known in the art, is a string of digits calculated by a function such as a one-way hash formula applied to a stream of data, and is used to verify data integrity. A digest is calculated, for example, by a transmitter and appended to a transmission. A receiver re-calculates the digest based on the data received, and compares it to the received digest. If the receiver-calculated digest does not match the transmitted digest, intentional or unintentional corruption of the transmitted data has occurred. Use of digests is optional and is determined by negotiations between an initiator and target during a login process.
PDUs 73 are transferred to a TCP layer 66, which implements functions of the OSI-RM Transport Layer, e.g., error checking and flow control, and generates one or more TCP segments 74 from PDUs 73. An IP layer 68 performs the functions of the OSI-RM Network Layer, e.g., routing and forwarding of packets in a network, producing IP packets 75. Finally, a lowest level Ethernet layer 70 implements the OSI-RM Data-Link Layer, performing synchronization for the physical transmission and handling low-level communications functions. Ethernet layer 70 transmits and receives data via a physical transmission medium (not shown in FIG. 2). Protocol stack 60 depicts transmitting, when viewed from SCSI layer 60 downward, and receiving, when viewed from Ethernet layer 70 upward.
FIG. 3 is a schematic block diagram depicting a flow 80 of SCSI transactions between an initiator and a target in an iSCSI architecture, as is known in the art. Flow 80 uses a protocol stack similar to that depicted in FIG. 2. Transmission begins with a user 82 initiating a storage request. User 82 performs substantially the same functions as user 20 in FIG. 1. The storage request may specify retrieval or storage of data, and is passed to a server 84 comprising a SCSI controller 86, an iSCSI transmit/receive device 87, and a TCP/IP protocol device 88. Server 84 corresponds to one of mail server 28, application server 30, or database server 32 described in reference to FIG. 1. The iSCSI protocol stack presented in FIG. 2 is implemented in iSCSI transmit/receive device 87, which may be implemented in software (e.g., a part of a computer operating system), hardware (e.g., a dedicated chip or board), or a combination of software and hardware, together with physical links 91 and 93.
SCSI controller 86 formulates the storage request in terms of one or more SCSI CDBs, substantially the same as CDBs 72 in FIG. 2. iSCSI transmit/receive device 87 constructs iSCSI PDUs, substantially the same as PDUs 73 in FIG. 2. TCP/IP protocol device 88 transforms the iSCSI PDUs into TCP segments, then into IP packets 92 (and also adds an Ethernet layer similar to layer 70), corresponding to TCP segments 74 and IP packets 75 in FIG. 2, respectively. IP packets 92 are transmitted in the direction of a target storage device 104 via physical links 91 and 93 and a TCP/IP network 90, which supports an Ethernet protocol corresponding to the Ethernet layer added. IP packets 92 are typically received by a storage server 100, which comprises a TCP/IP protocol device 96, an iSCSI receive/transmit device 98 and a SCSI controller 102, which are substantially similar in implementation to TCP/IP protocol device 88, iSCSI receive/transmit device 87 and SCSI controller 86.
IP packets 92 are deciphered into TCP segments by TCP/IP protocol stack 96, and the resulting TCP segments are processed by iSCSI receive/transmit 98, which reconstructs iSCSI PDUs and handles iSCSI flow control. The SCSI commands are extracted from the resulting iSCSI PDUs, and passed to SCSI controller 102, which causes the execution of the commands on a storage device 104.
Data returned from storage device 104, called a SCSI response, flows in a reverse order, from storage device 104, through the components of storage server 100, via IP network 90 to application server 84, and finally to user 82. It will be understood that FIG. 3 illustrates a sample configuration of an iSCSI architecture; numerous variations on this sample configuration are known in the art.
It will be clear from an examination of FIG. 3 that implementation of the iSCSI protocol comprises implementing a transmitter and a receiver function for both user 82 as an initiator of SCSI commands and device 104 as a target. As well, it is noted that communications between user 82 and device 104 occur over one or more TCP connections, as indicated schematically by arrows 91, 93, 95, and 97. A collection of TCP connections comprising communications between a specific initiator and a specific target is called a session, and is uniquely identified by a session ID number.
FIG. 4 is a schematic block diagram illustrating a mapping of TCP segments to iSCSI PDUs, assumed to be independent of each other, as is known in the art. FIG. 4 presents a stream of data 120, transmitted from server 84, which comprises three iSCSI PDUs: an iSCSI PDU 1, an iSCSI PDU 2, and an iSCSI PDU 3. Each iSCSI PDU comprises a header and, optionally, payload data, depending on the type of SCSI CDB or response contained in the PDU. Thus, iSCSI PDU 1 comprises Header 1 and Data 1, while iSCSI PDU 2 comprises only Header 2. To accomplish transmission over network 90 (FIG. 3), TCP/IP protocol device 88 partitions the three PDUs into five different TCP segments. Thus, TCP segment 1 comprises all of PDU 1's header and a portion 126 of its payload. TCP segment 2 comprises a further portion 128 of PDU 1's payload. TCP segment 3 comprises a last portion 130 of PDU 1's payload data, all of PDU 2's header (132), and a portion 134 of PDU 3's header. The IETF iSCSI Internet-Draft proposes mechanisms for delineating PDUs, i.e., determining beginning and ending boundaries for PDUs.
Since speed and throughput are prime factors in any iSCSI implementation, many iSCSI receiver implementations comprise embedded logic on an integrated circuit, located, for example, in iSCSI receive/transmit 98 (FIG. 3). Furthermore, high line speeds of one Gbit/s and 10 Gb/s require multiple processors in order to avoid bottlenecks.
Implementing iSCSI across multiple processors raises numerous questions and problems. For example, generating a complete iSCSI implementation on each processor results in significant duplication, and waste of integrated circuit resources. Deciding how to allocate incoming TCP segments among multiple processors raises additional problems. For example, if segments are allocated according to load balancing considerations only, race conditions could result from messages from a single connection being processed in different processors which access a shared memory. Thus, additional logic and data would be required to synchronize access from the different processors. Alternatively, allocating messages according to connection, i.e., all messages from a given connection A are allocated to processor A, could cause great inefficiencies in the case of a dominant connection using most of the line capacity. There is thus a need for an improved method for multiple processors to support iSCSI.