1. Field of Invention
The invention relates to an IPSec Processor and, in particular, to a mechanism for a high speed IPSec processing.
2. Related Art
IP Security (IPSec) apparatuses are used to secure the information propagated in a public network. Several applications, including Virtual Private Network (VPN) and cable modem, have adopted IPSec as a standard for their own security purpose. IPSec apparatus may have their processing throughput covering quite a wide range from an order of hundred kilobits-per-second to several Gigabits-per-second. There are several solutions for the IPSec apparatuses. One may use a full-software solution. The software solution works fine except the performance was only about 1M bits per second or even lower, which is really too slow. This is almost not acceptable especially in the network blooming era. The development of WDM and Gigabit Ethernet stimulate the network bandwidth from Megabits to Gigabits per second.
FIG. 1 shows a block diagram of a conventional IPSec system structure. The IPSec system consists of a CPU 100, a Memory 110, and an Accelerator 120. Here, the IPSec Accelerator 120 is employed only to reduce the CPU 100 computation load in 3DES and HMAC operations. The CPU 100 has to take care of all other functions, including the parsing, packet classification, database maintenance, pre-operation (e.g. packet forming and trailer making), post operation, packet IO, and the IP layer processing (e.g. fragmentation re-assembly). In addition, it has to form a context for the IPSec accelerator. The throughput is very limit due to a big overhead described above. The transfer speed is also limited by the rise time of the Memory 110 and thereof a long CPU read/write cycle. It is the easiest way to implement, whereas the system performance is quite low even employing a high-speed accelerator.
FIG. 2 shows a conventional IPSec processor with an embedded CPU and Memory, which is an extension of the IPSec system shown in FIG. 1. An IPSec processor is constructed of an embedded CPU 200, an embedded Memory 210, and an accelerator (or Crypto Engine 220). It does increase the transfer speed due to a higher data transfer rate. Yet, it has to deal with the big overhead as described above. Hence, it is still difficult to achieve a very high throughput of like Gigabits per second.
FIG. 3 shows the traditional pipeline concept. Packets are delivered through n stations, which deal with packet input, making trailer, header making/modifying, operation, post operation, and packet output respectively. The pipeline expedites the processing speed by making all the stations busy; a station works on the outputs of its previous station as soon as it is available. However, there are two problems:                1) A packet has to check in and check out of anyone of the stations. Hence, one is in want of additional buffers to get things done. It also takes time to check in and check out.        2) It takes extra time to feedback the data to the beginning state for an SA (security association) bundled case; one has to process the very packet again and needs feedback the data for the bundled SA processing.        
In the prior arts, several copies of accelerators (or Crypto Engine 220) could be duplicated such that the Crypto Engine 220 gains a high performance capability. Namely, the parallel technique is involved in that design. That is what current commercial products do in order to increase the IPSec processing performance. Some advanced commercial products add a few features like checksum and mutable bits processing, in their devices. There are however several drawbacks for this kind of the parallel processing:                1) It's very time consuming or even difficult to deal with a bundled SA case, since the whole packet has to be fed back for the bundled SA; it has to repeat the processes from parsing, classification . . . to output.        2) Crypto Engine 220 utility is not high. Accordingly, the Crypto Engine 220 has to deal with encryption, authentication, and encryption plus authentication. The encryption engine and authentication engine are chained together to provide all the three service styles. Hence, the whole Crypto Engine 220 can service one packet with one of three service styles; it can not service two packet at a time. A “collision” problem also causes a reduction of the utility of Crypto Engines 220. When two Crypto Engines 220 finish their job at about the same time, one of the Crypto Engines 220 has to output after the other. No input is allowed before the complete of output, therefore no input is allowed for both of the two engines and one of them has to idle even longer.        3) The control is complicated. Firstly, one needs to build up a context for the Crypto Engine 220 (or accelerator). Secondly, the post processing causes an extra effort.        4) It is not efficient to verify the authenticity of incoming packets one can do verification only after the crypto operation is completed. Since the crypto operation is the bottleneck of the whole process. It may take long time to perform a decryption operation while the packet is turned out to be a fake one.        5) The bottleneck may switch to the pre-operation, which includes packet forming and context making. Seeing that crypto engines could be duplicated as many as you want while the pre-operation is alone.        