RDMA is a technique for efficient movement of data over high-speed transports. RDMA enables a computer to directly place information in another computer's memory with minimal demands on memory bus bandwidth and CPU processing overhead, while preserving memory protection semantics. RNIC is a Network Interface Card that provides RDMA services to the consumer. The RNIC may provide support for RDMA over TCP.
RNIC can serve as an iSCSI target or initiator adapter. “Initiator” refers to a SCSI command requester (e.g., host), and “target” refers to a SCSI command responder (e.g., I/O device, such as SCSI drives carrier, tape).
Much work has been done to create efficient, scalable and flexible RDMA and iSCSI acceleration solutions, but a successful solution is not trivial. One challenge is that all data processing operations must be handled efficiently, while at the same time, the protocol implementation must be flexible. The need for flexibility in protocol implementation is particularly important for TCP, which constantly evolves, attempting to adapt TCP behavior to changing network speed, traffic pattern and a network infrastructure. Another challenge is the ability to adapt to increases in main CPU speed, main memory bandwidth and latency.
One example of a prior art solution, which uses RNICs for network acceleration, is that of embedded processors that handle protocol processing. One or more embedded CPUs are tightly coupled with the data path, and touch each incoming and generated packet. There are different hardware acceleration engines surrounding such embedded CPUs, which assist in different data processing operations. Such a solution is generally limited by the embedded CPU capabilities, which typically lag behind the main CPU technology for several generations. This limits the performance benefits and life-time of such solutions. Latency is relatively high, since before the packet is generated to the network or placed to the memory, it has to be processed by one or more CPUs. To reach high networking rates, multiple CPUs need to be placed on the data path and perform simultaneous handling of multiple packets. This adds additional latency, makes implementation difficult and increases the cost of the overall solution.
Another prior art solution is a state machine implementation. However, this lacks flexibility in protocol processing, which as mentioned previously, is particularly important in TCP.