1. Field of the Invention
The present invention relates to data storage servers, data networks, and, in particular, to accelerators for data transfer.
2. DESCRIPTION OF THE RELATED ART
Many applications require bulk data transfer, and in sonic applications such as streaming video. the data transfer requires low latency. Servers, used for bulk data transfer, data backup and streaming video, transfer data by reading the data from locally attached storage or from the storage attached over the network such as SAN(Storage Area Network) or NAS (Network Attached Storage) and send that data over the network after packaging it with the appropriate network protocols. FIG. 1 shows a typical processing architecture as might be found in the prior art.
Main processor 101 transfers a number of data streams between devices coupled to input interface 102, and devices coupled to output interface 103. Main processor 101, to accomplish this transfer, typically must provide several levels of processing. As shown in FIG. 1, main processor 101 is coupled to several sub-processing modules: cryptography module 104, compression module 105, multi-media module 106, network protocol module 107, and digest generation module 109. These various modules share memory 108, typically a form of cache memory, to enable differing types of processing, or data transformation. Cryptography module 104 might be employed to provide encryption and decryption of data in accordance with any of a number of well-know data security standards. Compression module 105 might be employed to compress data for storage or expand compressed data from storage media. Multi-media module 106 might be employed to decode video and audio data in accordance with any one of the MPEG standards, and network protocol module 107 might be employed to translate between different types of packet protocols. Digest generation module 109 may be used to calculate hash values, or signatures for strings of data. These sub--processing modules might be implemented in software. hardware or some combination of software and hardware.
Context switching refers to when a multi-tasking operating system (e.g., main processor 101) stops running one process and starts running another. Many operating systems implement concurrency by maintaining separate environments or “contexts” for each process. The amount of separation between processes, and the amount of information in a context, depends on the operating system, but generally higher level coordination is employed to prevent processes from interfering with each other (e.g., by modifying each other's memory data, or pages). A context switch might simply change values of program counter(s) and stack pointer(s), or might reset the entire processor unit to make a different set of memory pages available. Many systems context switch at an exceptionally high rate in order w present the user with an impression of parallel processing, and to allow processes to respond quickly to external events.
Bulk data transfer, such as video streaming and large file transfers for backup, require considerable time in disk input/output (IO) and network IO. For example, data, which is stored in the file format on the disk is read and packaged into the network packets and sent over the network interface. The data goes through various disk IO protocol stacks on the storage side and through various network protocol stacks on the network side. The IO path is multi level and goes through various layers in the disk driver, applications and network stack. This transformation of data through various layers requires many CPU cycles, is bound by the CPU IO capacity, and might introduce considerable latency due to the various types of multi-tasked processing of the data streams.
As is evident from the above discussion, data transformation for streaming data is a challenge for various types of dedicated processing modules within, for example, the server due to variation of input arrival rates, thereby reducing quality of the data transformation performed by each dedicated processing module. An example would be a compression transformation on streaming data, which is normally inefficient when confronted with a variation of input arrival rates. Another example is the cryptographic function often employed in data storage and transfer, where there is relationship within the data stream such as chaining in AES-CBC cryptography. The loss of a current state when a relationship exists requires a restart of the encryption engine, reducing the encryption strength. State full stream transformation is required for security transformations such as AES-CBC. Performing the encryption as individual blocks of data, however, would require the AES-CBC encryption initialization vector to be setup for every block, which reduces the security level of the transformation.
Stateless or segmented or packetized transformation processing is simple, but degrades the quality of the transformation. In the case of compression, the compaction ratio is reduced when the transformation is applied to independently to pieces of information. Further, if data is compressed immediately, then there is too little data to be compressed, reducing the compress gains, tithe input data is buffered for too long, then the latency of the stream increases. Providing significant compression on streaming data thus provides many challenges to the system designer.