1. Field of the Invention
The present invention relates to network systems, and more particularly to, processing markers, data integrity fields and digests.
2. Background of the Invention
Storage area networks (“SANs”) are commonly used where plural memory storage devices are made available to various host computing systems. Data in a SAN is typically moved from plural host systems (that include computer systems, servers etc.) to a storage system through various controllers/adapters.
Host systems often communicate with storage systems via a host bus adapter (“HBA”, may also be referred to as a “controller” and/or “adapter”) using an interface, for example, the “PCI” bus interface. PCI stands for Peripheral Component Interconnect, a local bus standard that was developed by Intel Corporation®. The PCI standard is incorporated herein by reference in its entirety. Most modern computing systems include a PCI bus in addition to a more general expansion bus (e.g. the ISA bus). PCI is a 64-bit bus and can run at clock speeds of 33 or 66 MHz.
PCI-X is another standard bus that is compatible with existing PCI cards using the PCI bus. PCI-X improves the data transfer rate of PCI from 132 MBps to as much as 1 GBps. The PCI-X standard was developed by IBM®, Hewlett Packard Corporation® and Compaq Corporation® to increase performance of high bandwidth devices, such as Gigabit Ethernet standard and Fibre Channel Standard, and processors that are part of a cluster.
Various other standard interfaces are also used to move data from host systems to storage devices. Internet SCSI (iSCSI) is one such standard as defined by the Internet Engineering Task Force (IETF) maps the standard SCSI protocol on top of the TCP/IP protocol. iSCSI (incorporated herein by reference in its entirety) is based on Small Computer Systems Interface (“SCSI”), which enables host computer systems to perform block data input/output (“I/O”) operations with a variety of peripheral devices including disk and tape devices, optical storage devices, as well as printers and scanners.
A traditional SCSI connection between a host system and peripheral device is through parallel cabling and is limited by distance and device support constraints. For storage applications, iSCSI was developed to take advantage of network architectures based on Fibre Channel and Gigabit Ethernet standards. iSCSI leverages the SCSI protocol over established networked infrastructures and defines the means for enabling block storage applications over TCP (Transmission Control Protocol)/IP (Internet Protocol) networks. iSCSI defines mapping of the SCSI protocol with TCP/IP.
Networks are generally defined as having layers of protocol. The iSCSI and TCP/IP protocol suite consist of 4 protocol layers; the application layer (of which iSCSI is one application), the transport layer (TCP), the network layer (IP) and the link layer (i.e. Ethernet). A complete description of the TCP/IP protocol suite is provided in “TCP/IP” Illustrated, Vol. 1 by W. Richard Stevens and Volume 2 by Gary R. Wright and W. Richard Stevens published by Addison Wesley Professional Computing Series. The following provide a brief overview of TCP, iSCSI and RDMA protocol/standards.
TCP Overview
TCP is a network protocol that provides connection-oriented, reliable, byte stream service. This means that two nodes must establish a logical connection before sending data and that TCP maintain state information regarding the data transfer. Reliable means that data is guaranteed to be delivered in the same order that it was sent. A byte stream service means that TCP views data to be sent as a continuous data stream that is sent in any way it sees fit and delivers it to the remote node as a byte stream. There is no concept of a data frame boundary in a TCP data stream.
Sequence Numbering in TCP Data Transfer
Each byte of data sent using a TCP connection is tagged with a sequence number. Each TCP segment header contains the sequence number of the first byte of data in the segment. This sequence number is incremented for each byte of data sent so that when the next segment is to be sent, the sequence number is again set for the first byte of data for that segment. The sequence numbering is used to determine when data is lost during delivery and needs to be retransmitted.
iSCSI Architecture Overview
The iSCSI architecture is based on a client/server model. Typically, the client is a host system such as a file server that issues a read or write command. The server may be a disk array that responds to the client request.
The following introduces some of the basic terms used in an iSCSI data transfer:                “Exchange”—The operations needed to do a iSCSI data read or write. An exchange consists of three operational phases: command phase, data movement phase and response phase.        “Initiator”—Typically the client is the initiator that initiates a read or write command.        “Target”—Typically a disk array is the target that accepts a read or write command and performs the requested operation.        “Read/Write”—Reads or writes are based on the initiator.        
In a typical iSCSI exchange, an initiator sends a “read” or “write” command to a target. For a read operation, the target sends the requested data to the initiator. For a write command, the target sends a “Ready to Transfer Protocol Data Unit (“PDU”)” informing the initiator that the target is ready to accept the write data. The initiator then sends the write data to the target. Once the data is transferred, the exchange enters the response phase. The target then sends a response PDU to the initiator with the status of the operation. Once the initiator receives this response, the exchange is complete. The use of TCP guarantees the delivery of the PDUs.
Typically, logical units in the target process commands. Commands are sent by the host system in Command Descriptor Blocks (“CDB”). A CDB is sent to a specific logical unit, for example, the CDB may include a command to read a specific number of data blocks. The target's logical unit transfers the requested data block to the initiator, terminating with a status message indicating completion of the request. iSCSI encapsulates CDB transactions between initiators and targets over TCP/IP networks.
“RDMA” Overview:
Remote direct memory access (RDMA), is a communications technique that allows data to be transmitted from the memory of one computer to the memory of another computer without passing through either device's central processing unit (“CPU”), and without calling to an operating system kernel. RDMA is a response to increasing demands for network speed. Data can be transferred faster when it does not have to pass through the CPU. The Infiniband standard (incorporated herein by reference in its entirety) is an example of a form of RDMA. Applications of RDMA include clustering and storage and networking for data centers.
Markers, Data Integrity Fields (“DIFs”) and Digests:
Embedded in a stream of iSCSI or RDMA data, there are three fields, which may need to be located for processing by a receiving node. These fields are referred to as: Markers, DIFs, and Digests. Each of these fields may or may not be present in a data stream regardless of the presence of the other fields. The location of each field in a data stream is unrelated, but can have an affect on locating other fields.
Markers:
Markers are inserted into a data stream periodically at a predetermined interval, starting at a given TCP sequence number. Markers are a fixed length, and indicate the offset to the start of the next protocol data unit (“PDU”). iSCSI markers are 8 bytes long, while RDMA markers are 4 bytes long. Insertion of iSCSI markers into the data stream is performed (logically) after insertion of digests and/or DIFs. Thus, iSCSI markers are not included in the Cyclic Redundancy Check (CRC) calculation for either of those fields.
RDMA markers are inserted into a data stream (logically) after the insertion of DIFs, but prior to insertion of Digests. Thus, RDMA markers are not included in the calculation of the DIF CRC, but are included in the Digest CRC calculation.
DIFs:
DIFs are 8-byte fields appended to each block of data stored on a mass storage device. A DIF contains a Reference Tag, Application Tag, and a CRC value. As a DMA occurs, it is necessary to calculate the CRC for each DIF on each data block during a transfer. Depending on the application in a system, an incoming data stream may need to insert DIFs periodically into the data stream, validate and remove them from the data stream, or validate them and keep them in the data stream. These are three different modes for processing DIFs. Calculation of the DIF CRC does not include Markers or Digests.
Digests:
Digests are 4-byte fields appended to the end of a PDU, which are a CRC calculation over the data portion of the PDU. DIFs are included in the Digest calculation for both iSCSI and RDMA. Markers are not included in the iSCSI Digest calculation, but are included in the RDMA Digest calculation.
Typically when data is received from the network and is first stored at the HBA's local memory, data may not be in order and may or may not include the markers, DIFs and digests. To process the markers, DIFs and digests before data is sent to the host (or when being sent by the host) can be cumbersome and affect overall data transfer efficiency.
In conventional systems, Markers, DIFs, and Digests are processed independently at different points in a data stream transfer. This has disadvantages because there is no overlapping protection of data by both DIF and Digest and data may get corrupted. Also, iSCSI and RDMA treat calculation of digests with respect to markers differently, so logic would need to be duplicated if both protocols were to be supported.
In data transferred by a host system, markers, DIFs, and digests are typically inserted at different stages of the data path by conventional systems. This approach has problems because there is no overlapping protection of data by DIFs and Digests and data may get corrupted. Also, iSCSI and RDMA treat calculation of digests/markers differently. In conventional systems, separate logic is needed if both protocols were to be supported. This cost of separate logic makes the overall conventional systems expensive and cumbersome.
Therefore, there is a need for a system and method that can efficiently handle markers, digests and DIFs in network data streams.