In the past years, data processing systems networks (hereinafter simply referred to as computer networks) and, particularly, those computer networks that rely on the TCP/IP protocol, have become very popular.
One of the best examples of computer network based on the TCP/IP protocol is the Ethernet, which, thanks to its simplicity and reduced implementation costs, has become the most popular networking scheme for, e.g., LANs (Local Area Networks), particularly in SOHO (Small Office/Home Office) environments.
The data transfer speed of computer networks, and particularly of Ethernet links, has rapidly increased in the years, passing from rates of 10 Mbps (Mbits per second) to 10 Gbps.
The availability of network links featuring high data transfer rates is particularly important for the transfer of data among data storage devices over the network.
In this context, the so-called iSCSI, an acronym which stands for “internet SCSI” (Small Computer System Interface) has emerged as a new protocol used for efficiently transferring data between different data storage devices over TCP/IP networks, and particularly the Ethernet. In very general terms, iSCSI is an end-to-end protocol that is used to transfer storage data between so-called SCSI data transfer initiators (i.e., SCSI devices that start an Input/Output—I/O—process, e.g., application servers, or simply users' Personal Computers—PCs—or workstations) to SCSI targets (i.e., SCSI devices that respond to the requests of performing I/O processes, e.g., storage devices), wherein both the SCSI initiators and the SCSI targets are connected to a TCP/IP network.
The iSCSI protocol has been built relying on two per-se widely used protocols: from one hand, the SCSI protocol, which is derived from the world of computer storage devices (e.g., hard disks), and, from the other hand, the TCP/IP protocol, widely diffused in the realm of computer networks, for example the Internet and the Ethernet.
Without entering into excessive details, known per-se, the iSCSI protocol is a SCSI transport protocol that uses a message semantic for mapping the block-oriented storage data SCSI protocol onto the TCP/IP protocol, which instead takes the form of a byte stream; SCSI commands can thus be transported over the TCP/IP network: the generic SCSI Command Descriptor Block (CDB) is encapsulated into an iSCSI data unit, called Packet or Protocol Data Unit (PDU), which is then sent to the TCP layer (and lower processing layers) for being transmitted over the network to the intended destination SCSI target; similarly, a response from the SCSI target is encapsulated into an iSCSI PDU and forwarded to the target's TCP layer (and lower processing layers) for being transmitted over the network back to the originating SCSI initiator.
Recently, the fast increase in network data transfer speeds, that have outperformed the processing capabilities of most of the data processors (Central Processing Units—CPUs—or microprocessors), has however started posing some problems.
The processing of the iSCSI/TCP/IP protocol aspects is usually accomplished by software applications, running on the central processors (CPUs) or microprocessors (the host central processors) of the PCs, workstations, server machines, or storage devices that are connected to the computer network. This is not a negligible task for the host central processors: for example, a 1 Gbps network link, rather common nowadays, may constitute a significant burden to a 2 GHz host central processor of, e.g., a computer or workstation being an application server in the computer network: the server's CPUs may in fact easily spend half of its processing power to perform relatively low-level processing of TCP/IP protocol-related aspects of the data to be sent/received over the computer network, with a consequent, inevitable reduction in the processing power left available to the other running software applications.
In other words, despite the impressive growth in computer networks' data transfer speeds, the relatively heavy processing overhead required by the adoption of the iSCSI/TCP/IP protocol constitutes one of the major bottlenecks against efficient data transfer and against a further increase in data transfer rate over computer networks. This means that, nowadays, the major obstacle against increasing the network data transfer rate is not the computer network transfer speed, but rather the fact that the iSCSI/TCP/IP protocol stack is processed (by the CPUs of the networked SCSI devices exchanging the storage data through the computer network) at a rate less than the network speed. In a high-speed network it may happen that a CPU of a SCSI device has to dedicate more processing resources to the management of the network traffic (e.g., for reassembling data packets received out-of-order) than to the execution of the software application(s) it is running.
A significant burden on the host CPU is in particular posed by the management of data integrity validation.
The iSCSI data units, the so-called PDUs, include each a PDU header portion and, optionally (depending on the PDU type), a PDU payload portion. On the contrary, the TCP/IP protocol is a byte-stream protocol, which treats the data received from the Upper Layer Protocols (ULPs) as a simple stream of eight-bit bytes, without any particular boundary. The TCP layer groups together a certain number of bytes received from the ULPs, to form a so-called TCP segment, which is then transmitted. Thus, for the TCP layer there are no identifiable boundaries between the iSCSI PDU internal portions and between the different PDUs.
Additionally, while the TCP/IP protocol is intrinsically (and deliberately) vulnerable, lacking the possibility of providing strong data integrity validation (a simple checksum is in fact used to protect a TCP/IP segment), the iSCSI protocol implements instead a relatively strong mechanism for allowing data corruption detection/protection, which allows exploiting up to two corruption-detection digests or CRCs (Cyclic Redundant Codes) per PDU: a first CRC may be provided in a PDU for protecting the PDU header, whereas a second CRC may be provided for protecting the PDU payload (when present).
The generation of the CRCs for the PDUs is one of the computing-intensive aspects of the iSCSI protocol implementation, particularly considering that it usually happens that TCP segments need to be retransmitted, because not acknowledged by the intended destination (for example, due to packet losses): a retransmit operation involves retransmitting TCP segments starting from the byte, in the byte stream received at the TCP layer, that was transmitted with the first not acknowledged packet. Since the TCP layer is not aware of any iSCSI PDU boundary, the generation of CRCs for retransmit is particularly complex.
Solutions for at least partially reducing the burden of processing the low-level TCP/IP protocol aspects of the network traffic on host central processors of application servers, file servers, PCs, workstations, storage devices have been proposed. Some of the known devices are also referred to as TCP/IP Offload Engines (TOEs).
Basically, a TOE offloads the processing of the TCP/IP protocol-related aspects from the host central processor to a distinct (accelerator) hardware, typically embedded in the Network Interface adapter Card (NIC) of, e.g., the PC or workstation, by means of which connection to the computer network is accomplished.
A TOE can be implemented in different ways, both as a discrete, processor-based component with a dedicated firmware, or as an ASIC-based component, or as a mix of the previous two solutions.
By offloading TCP/IP protocol processing, the host CPU is at least partially relieved from the significant burden of processing the protocol stacks, and can concentrate more of its processing resources on the running applications.
However, since the TCP/IP protocol stack was originally defined and developed for software implementation, the implementation of the processing thereof in hardware poses non-negligible problems, such as how to achieve effective improvement in performance and avoid additional, new bottlenecks in a scaled-up implementation, and how to design an interface to the ULPs, that are still implemented in software and under the responsibility of the host CPU.
Offloading from a host CPU only the processing of the TCP/IP protocol-related aspects, as the known TOEs do, may be not sufficient to achieve the goal of significantly reducing the processing resources that the host CPU has to devote to the handling of data traffic over the network: some of the aspects peculiar of the iSCSI protocol may still cause a significant burden on the host CPU. Thus, in order to effectively relieve the host CPU, TOE engines should also perform some iSCSI processing, in addition to perform TCP/IP protocol processing.
Particularly, an aspect of iSCSI that is advantageously offloaded from the host CPU to a hardware accelerator is the handling of iSCSI PDUs' CRCs (generation and validation).