Computer architecture generally defines the functional operation, including the flow of information and control, among individual hardware units of a computer. One such hardware unit is the processor or processing engine which contains arithmetic and logic processing circuits organized as a set of data paths. In some implementations, the data path circuits may be configured as a processor having internal registers for use with operations that are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the processor. When implementing these functions, the CPU generally processes “transient” data residing in a data memory in accordance with the instructions.
A high-performance processing engine configured for use in, e.g., an intermediate network station such as a router or switch may be realized by using a number of identical processors to perform certain tasks in parallel. For a purely parallel multiprocessor architecture, each processor may have shared or private access to non-transient data (such as “table” data contained in forwarding and routing tables, statistics, access filters and/or queuing information) stored in an external memory coupled to the processor. Another type of non-transient data stored in the external memory may be encryption key data used for encryption of the transient (e.g., message) data processed by the switch within a data communcations network.
Data encryption is critical in the context of data communications networks because of the security required for reliable data communications between applications executing on stations of the network. There are many mechanisms that provide security for messages transported over a data communications network, such as the Internet. Industry standards describe protocol conventions used when exchanging encrypted information. For example, RFC 1827 describes multiple modes (e.g., tunnel mode and transport mode) for exchanging data. In order to support such conventions, efficient protocol and data encryption processing solutions are needed for these stations, including network switches.
Software-based applications executing on a general-purpose processor of a switch provide a data encryption processing solution that is dependent upon protocol decode of messges by the processor. However, conventional encryption algorithms, such as the data encryption standard (DES), execute slowly on a general-purpose processor. Hardware-based encryption modules are available that interface with the processor, but that still require software-based protocol decode support. The resulting interface between software executing on the processor and the hardware encryption module substantially reduces the overall performance of the switch. Performance is adversely affected primarily because of the memory and input/output (I/O) bandwidth consumed by the encryption module when interacting with components of the switch; in addition, the overhead required to pass encryption keys and data to and from the module, e.g., over a bus contributes to the reduction in performance.
FIG. 1 is a schematic block diagram of a conventional switch 100 configured with conventional data encryption/decryption capabilites. The switch comprises a plurality of components including a central processing unit (CPU) 110, a memory 120, a network adapter 130 and an external interface (I/F) circuit 140 interconnected via a system bus 150. The network adapter 130 provides a connection to network media that enables transmission and reception of data messages hereinafter “packets”) that require data encryption/decryption. For example, a packet received from the network media is accumulated by the adapter, which then performs a direct memory access (DMA) operation over the system bus 150 to store the packet in an allocated buffer of the memory 120. Thereafter, the CPU processes the packet to determine whether it needs encryption or decryption. An encryption/decryption (hereinafter “DES”) module 160 is directly coupled to the external interface circuit 140 to provide the data encryption/decryption capabilites of the switch.
Since different protocols define different ways of encoding packets (see RFC 1827), hardware implementation of data encryption/decryption is generally complex and inefficient. That is, the overall encryption/decryption function provided by the DES hardware module 160 introduces significant latency as a result of the software-hardware interface of the switch 100. For example prior to actual encryption/decryption operations, software executing on the CPU 110 must identify the portion of the packet (i.e., the proper header) to decode in order to determine the appropriate protocol and encryption algorithm. If tunneling has occurred on the packet, such protocol decoding may not start at the beginning of the packet, thus requiring the CPU to parse the packet to retrieve the proper header (e.g., an IP header) and data (“payload”) needing encryption/decryption. Software protocol processing may therefore contribute to the latency associated with the encryption/decryption function.
Specifically, decode processing of the IP header identifies the appropriate encryption keys 122 stored in memory 120 that are needed for encryption/decryption operations in accordance with RFC 1827. The keys are organized as data structures, such as tables 125, in the memory and generally stored as 64-bit fields within the tables. A common type of encryption operation is a “triple DES” operation that typically uses three keys. To perform such encryption, either the CPU moves the key data from memory 120 over the system bus 150 to the interface circuit 140 which then communicates with the DES module 160 or, alternatively, the interface circuitry 140 moves the data to the DES module 160 as part of a DMA operation.
Notably, the data moved to the DES module includes the payload (“cleartext” or “ciphertext”), the keys and, possibly, a previous session state. As for the latter, the states of different encryption modes/packets are saved at the DES module when multiplexing various packets over the system bus in an attempt to achieve “wire speed” encryption/decryption, which is typically performed 64 bits at a time. Generally, the keys are loaded only once per packet at the DES module, wherein loading of the packet may require many 64-bit data transfers. When multiplexing packets over the bus, however, constant reloading of the keys may be required. Thus, a total of at least four 64-bit system bus transfers (three keys and one data) is required to load the DES module in order to perform one triple DES operation. Because of the previous protocol decode, the cleartext/ciphertext data transfer may cause a non-aligned access which must be subsequently aligned, thereby creating additional latency as well as consuming a significant amount of the bus bandwidth. Significant latency is also contributed when traversing the external interface 140 to the bus.
Processing latency is further introduced by coding constructs necessary to load the DES hardware module 160 with the required information and thereafter retrieve the results upon completion of the encrypted/decrypted operations. As for the latter, software executing on the CPU 110 is notified of hardware completion either by polling the hardware module or by interrupting the CPU. Since polling requests issued by the CPU are translated to bus read operations, polling consumes additional bus bandwidth, whereas interrupts may actually improve bus efficiency. However, an interrupt initiated by the hardware module causes the state of the processor to change while the interrupt is serviced; in this case, latency of the interrupt service routine is added to the overall latency.
Once encryption/decryption operations have completed at the DES module, the ciphertext/cleartext (and session state) are returned to the memory 120, where the data is staged prior to transmission through the adapter 130 over the network. Typically, the ciphertext/cleartext is staged in memory until an entire frame descriptor block is constructed for transmission as a frame/packet over the network.
It is apparent that data encryption/decryption by the hardware module arrangement described above entails substantial data movement over the system bus of the switch, thereby causing a “bottleneck” on the bus that adversely affects the performance of the switch. The present invention is directed towards reducing the amount of data movement and resulting latency in an intermediate station such as a switch when performing data encryption/decryption functions.