1. Field of the Invention
The present invention relates generally to cryptography and, more particularly, to systems and methods that provide high performance cryptography.
2. Description of Related Art
Compared to network data transmission, cryptographic protection of data is a computationally-intensive task. There is a need, however, for network-speed cryptography to support the Secure Internet Protocol (IPsec) security standard for data protection between entities communicating over the Internet. This has lead to the development of cryptography units employing multiple cryptography engines, whose aggregate performance matches network data rates.
Existing parallel cryptography units employ one of three techniques to achieve higher performance: pipelined, block-parallel, and flow-parallel techniques. FIG. 1 is a diagram of a conventional pipelined system 100 that operates upon blocks of a packet. Each packet is broken into multiple fixed-sized data blocks before being operated upon by pipelined system 100.
Pipelined system 100 includes a series of cryptography stages 110 that perform a cryptographic (e.g., encryption or decryption) operation on data blocks of a packet. Each of cryptography stages 110 performs part of the cryptographic operation (f(X)) on a data block and passes it onto the next stage for the next part of the cryptographic operation. If the pipelined system 100 includes four cryptography stages 110, the portions of the cryptographic operation performed by the four cryptographic stages 110 may be represented by f1(X), f2(X), f3(X), and f4(X), respectively. In this case, the cryptographic operation may be defined as: f(X)=f4(f3(f2(f1(X)))).
FIG. 2 is a diagram of a conventional block-parallel system 200 that operates upon multiple blocks of a packet in parallel. Block-parallel system 200 includes multiple cryptographic sub-units 210 connected in parallel between demultiplexer 220 and multiplexer 230. Demultiplexer 220 delivers a new data block arriving for encryption or decryption to a currently unused cryptographic sub-unit 210. Demultiplexer 220 typically uses a round robin technique to select a sub-unit 210, since the cryptographic operation usually takes the same amount of time for each data block. Each of sub-units 210 performs a cryptographic operation on its data block and outputs the result to multiplexer 230. Multiplexer 230 multiplexes the results from sub-units 210 together into a single stream.
FIG. 3 is a diagram of a conventional flow-parallel system 300 that operates upon multiple packets in parallel. Unlike the other systems 100 and 200, flow-parallel system 300 operates upon units of packets rather than units of data blocks. Flow-parallel system 300 includes multiple cryptographic sub-units 310 connected in parallel via input buffers 320 and output buffers 330 to demultiplexer 340 and multiplexer 350.
Demultiplexer 340 uses information within the packet to be encrypted or decrypted to select a sub-unit 310 to process the packet. When IPsec is used, demultiplexer 340 normally uses the Security Association (SA) to which the packet belongs in determining which sub-unit 310 to select. There is typically a different SA for each remote entity with which the network device is communicating. Other characteristics of a packet, such as the TCP connection to which it belongs, can also be used.
Demultiplexer 340 stores the packet in an input buffer 320 of the selected sub-unit 310. Input buffer 320 typically includes a first-in first-out (FIFO) memory. Sub-unit 310 performs a cryptographic operation (e.g., encryption or decryption) on the packet and stores the result in output buffer 330. Output buffer 330 typically includes a FIFO memory. Multiplexer 350 receives packets from output buffers 330 and multiplexes them together into a single stream.
Pipelined and block-parallel systems suffer from an inability to handle common cryptographic modes, where the encryption or decryption of a block is dependent on the completion of the prior block in a series of blocks. In particular, the Cipher Block Chaining (CBC) mode, which is widely accepted as the only current cryptographic mode suitable for the encryption of packet data, has this property. Thus, pipelined and block-parallel systems are not suited for packet-based cryptography employing the CBC mode. The block-parallel technique can also experience difficulties with other modes, such as the “counter” mode, where certain state information must be shared among multiple sub-units working on the same packet.
It may be possible to modify the block-parallel technique so that all data blocks from a single packet are assigned, in sequence, to the same sub-unit. Assuming that all sub-units have similar performance, this means that short packets (with few data blocks) will finish faster than long packets (with many data blocks), resulting in packets becoming out of order, as short packets get ahead of longer ones. Packet reordering is considered a highly undesirable behavior because it degrades the throughput of the widely used TCP. Thus, such a modified block-parallel technique has significant disadvantages that prevent its successful use.
Flow-parallel systems can handle CBC and similar feedback modes because all related data blocks from a single packet are handled by the same sub-unit. These systems also avoid the problems of packet reordering because all packets from a single flow are processed in order through the same sub-unit. Reordering of packets between flows is considered acceptable behavior because it does not affect TCP throughput. Flow-parallel systems, however, limit the maximum throughput on any flow to the maximum performance of a single sub-unit. As a result, while large aggregate data rates can be achieved for many flows through a single cryptography device, individual flows cannot approach the full throughput of a high bandwidth network interface.
Also, flow-parallel systems can suffer from traffic imbalances among the different sub-units, with some sub-units going unused with no flows currently assigned to them or actually sending traffic enough to fill them, while other sub-units are oversubscribed with several high bandwidth flows that exceed the capacity of the sub-units. Because it is difficult to determine, a priori, what the bandwidth of a given flow will be, the assignment of flows to sub-units will generally be sub-optimal.
Therefore, there is a need for network-speed cryptography that supports current security protocols, such as IPsec, for data protection between entities communicating over a network at full line rate with no reordering.