1. Field of the Invention
The present invention relates to network systems, and more particularly, to efficiently using buffer space.
2. Background of the Invention
Storage area networks (“SANs”) are commonly used where plural memory storage devices are made available to various host computing systems. Data in a SAN is typically moved from plural host systems (that include computer systems, servers etc.) to a storage system through various controllers/adapters.
Host systems often communicate with storage systems via a host bus adapter (“HBA”, may also be referred to as a “controller” and/or “adapter”) using an interface, for example, the “PCI” bus interface. PCI stands for Peripheral Component Interconnect, a local bus standard that was developed by Intel Corporation®. The PCI standard is incorporated herein by reference in its entirety. Most modern computing systems include a PCI bus in addition to a more general expansion bus (e.g. the ISA bus). PCI is a 64-bit bus and can run at clock speeds of 33 or 66 MHz.
PCI-X is another standard bus that is compatible with existing PCI cards using the PCI bus. PCI-X improves the data transfer rate of PCI from 132 MBps to as much as 1 GBps. The PCI-X standard was developed by IBM®, Hewlett Packard Corporation® and Compaq Corporation® to increase performance of high bandwidth devices, such as Gigabit Ethernet standard and Fibre Channel Standard, and processors that are part of a cluster.
Various other standard interfaces are also used to move data from host systems to storage devices. Internet SCSI (iSCSI) is one such standard as defined by the Internet Engineering Task Force (IETF) maps the standard SCSI protocol on top of the TCP/IP protocol. iSCSI (incorporated herein by reference in its entirety) is based on Small Computer Systems Interface (“SCSI”), which enables host computer systems to perform block data input/output (“I/O”) operations with a variety of peripheral devices including disk and tape devices, optical storage devices, as well as printers and scanners.
A traditional SCSI connection between a host system and peripheral device is through parallel cabling and is limited by distance and device support constraints. For storage applications, iSCSI was developed to take advantage of network architectures based on Fibre Channel and Gigabit Ethernet standards. iSCSI leverages the SCSI protocol over established networked infrastructures and defines the means for enabling block storage applications over TCP (Transmission Control Protocol)/IP (Internet Protocol) networks. iSCSI defines mapping of the SCSI protocol with TCP/IP.
Networks are generally defined as having layers of protocol. The iSCSI and TCP/IP protocol suite consist of 4 protocol layers; the application layer (of which iSCSI is one application), the transport layer (TCP), the network layer (IP) and the link layer (i.e. Ethernet). A complete description of the TCP/IP protocol suite is provided in “TCP/IP” Illustrated, Vol. 1 by W. Richard Stevens and Volume 2 by Gary R. Wright and W. Richard Stevens published by Addison Wesley Professional Computing Series. The following provide a brief overview of TCP, iSCSI and RDMA protocol/standards.
TCP is a network protocol that provides connection-oriented, reliable, byte stream service. This means that two nodes must establish a logical connection before sending data and that TCP maintain state information regarding the data transfer. Reliable means that data is guaranteed to be delivered in the same order that it was sent. A byte stream service means that TCP views data to be sent as a continuous data stream that is sent in any way it sees fit and delivers it to the remote node as a byte stream. There is no concept of a data frame boundary in a TCP data stream.
iSCSI Architecture Overview
The iSCSI architecture is based on a client/server model. Typically, the client is a host system such as a file server that issues a read or write command. The server may be a disk array that responds to the client request.
The following introduces some of the basic terms used in an iSCSI data transfer:
“Exchange”—The operations needed to do a iSCSI data read or write. An exchange consists of three operational phases: command phase, data movement phase and response phase.
“Initiator”—Typically the client is the initiator that initiates a read or write command.
“Target”—Typically a disk array is the target that accepts a read or write command and performs the requested operation.
“Read/Write”—Reads or writes are based on the initiator.
In a typical iSCSI exchange, an initiator sends a “read” or “write” command to a target. For a read operation, the target sends the requested data to the initiator. For a write command, the target sends a “Ready to Transfer Protocol Data Unit (“PDU”)” informing the initiator that the target is ready to accept the write data. The initiator then sends the write data to the target. Once the data is transferred, the exchange enters the response phase. The target then sends a response PDU to the initiator with the status of the operation. Once the initiator receives this response, the exchange is complete. The use of TCP guarantees the delivery of the PDUs.
Typically, logical units in the target process commands. Commands are sent by the host system in Command Descriptor Blocks (“CDB”). A CDB is sent to a specific logical unit, for example, the CDB may include a command to read a specific number of data blocks. The target's logical unit transfers the requested data block to the initiator, terminating with a status message indicating completion of the request. iSCSI encapsulates CDB transactions between initiators and targets over TCP/IP networks.
iSCSI PDUs may vary greatly in size, from a few bytes to hundreds of kilobytes. Normally, the size of the data will be known before it is received, and a host computing system can allocate buffers of proper size and assign them to be used when data is received. However, under the iSCSI standard, data may also be transferred along with a command, before a receiving host system can allocate receive buffers.
When this occurs, data may be transferred to unassigned, pre-allocated (small or large) buffers. The choice to use small or large buffers has efficiency tradeoffs, depending on the size of data received. The use of small buffers only is efficient for small PDUs, as there is little unused space in the buffers. However when large amounts of data are transferred to small buffers, the buffers are linked by a scatter/gather list, which requires intense processing.
If only large pre-allocated buffers are used, then the large buffers are under utilized when small PDUs are received. This results in wastage of buffer space.
Therefore, there is a need for a system and method for efficiently using buffer space to handle variable iSCSI PDU sizes.