The complexity and sophistication of operating systems, application software, networking technology, and the like continue to increase at dramatic rates, resulting in increased computer functionality. This increased functionality often results in increased Central Processor Unit (CPU) load (hereinafter also referred to as “CPU overhead”) due to the additional duties that must be performed by the CPU to implement the increased functionality.
One area where the increase in CPU overhead is readily apparent is in the area of networked applications where network speeds are increasing due to the growth in high bandwidth media. Network speeds may even rival the CPU processor speed and access speeds for local memory at host computers. These networked applications further burden the host processor due to the layered architecture used by most operating systems, such as the seven-layer Open System Interconnect (OSI) model or the layered model used by the Windows operating system.
As is well known, such a model is used to describe the flow of data between the physical connection to the network and the end-user application. The most basic functions, such as putting data bits onto the network cable, are performed at the bottom layers, while functions attending to the details of applications are at the top layers. Essentially, the purpose of each layer is to provide services to the next higher layer, shielding the higher layer from the details of how services are actually implemented. The layers are abstracted in such a way that each layer believes it is communicating with the same layer on the other computer.
Various functions that are performed on a data packet as it proceeds between layers can be software intensive, and thus often require a substantial amount of CPU processor and memory resources. For instance, certain functions that are performed on the packet at various layers are extremely CPU intensive, such as packet checksum calculation and verification, message digest calculation, TCP segmentation, TCP retransmission and acknowledgment (ACK) processing, packet filtering to guard against denial of service attacks, and User Datagram Protocol (UDP) packet fragmentation. As each of these functions is performed, the resulting demands on the CPU can greatly affect the throughput and performance of the overall computer system.
As on-line sales and telecommuting uses increase, the security of data transmission has become increasingly important. This has required additional security functions to be performed on the data packet. These security functions include ensuring the privacy of data during its transmission across the different communication networks, ensuring the data packet is coming from an authenticated end host, application, or user, and ensuring that the data has not been modified during its transmission across different communication networks. For the purposes of this application, secure data transmission is defined as data transmission where any combination of one or more of privacy, authentication, or data integrity can be assumed end-to-end. Several standards have been developed to facilitate secure data transmission over data networks. These standards provide a method for remote systems to establish a secure session through message exchange and calculations, thereby allowing sensitive data being transmitted across the different communication networks to remain secure and tamper free (i.e., untampered). For example, the Internet security protocol (“IPSec”) may be used to establish secure host-to-host pipes, user level security, application level security, connection level security, and virtual private networks over the Internet. IPSec defines a set of specifications for cryptographic encryption and authentication. IPsec also supports several algorithms for key exchange, including an Internet Key Exchange (“IKE”) algorithm for establishing keys for secure sessions established between applications. The encryption and decryption of data (e.g., SSL encryption and IP Security encryption) is CPU intensive. For example, the current estimate for TCP send processing is about 2 cycles per byte of data transferred. For IPSec authentication, the CPU overhead varies, but in round numbers is approximately 15 cycles per byte. For encryption, the CPU overhead rises to between about 25 cycles per byte and about 145 cycles per byte.
From the above numbers, it can be seen that performing the tasks to establish a secure session is CPU intensive. The host processor performing all of these tasks can result in system performance suffering because resources are consumed for the tasks. The decrease in system performance impacts a network and users in various ways, depending on the function of the network element (e.g., routing, switching, serving, managing networked storage, etc.).
As the demand on CPU resources grows, the capability and throughput of computer hardware peripherals such as network interface cards (NICs) and the like are also increasing. These peripherals are often equipped with a dedicated processor and memory that are capable of performing many of the tasks and functions that are otherwise performed by the CPU.
Coprocessors have been developed to offload some of the tasks from the host processor. Some coprocessors have been developed to perform a specific primitive task for the host processor (e.g., hash data). However, the addition of a task specific coprocessor does not offload from the host processor a significant amount of the secure session establishment tasks. One alternative is to add multiple coprocessors to a system, each processor performing a different task. Such an alternative is limited by physical constraints (e.g., number of slots on a computer in which cards are connected) and introduces the problem of multiple communications between the host processor and the multiple coprocessors.
Other processors have been developed to perform more than one of the tasks required to establish a secure session. As an example of this, assume a processor can perform a cryptographic operation (i.e., an encrypt or decrypt), a key generation operation, and a hash operation. When a server has received a request to establish a secure session, the server must call the processor to decrypt a pre-master secret received from a client. To generate a master secret and key material, the host processor must make approximately twenty calls to the processor (one for each hash operation). As illustrated by this example, a processor that can perform multiple tasks does not solve the issue of resource consumption from multiple communications between the host processor and the coprocessor.
Accordingly, there is a need in the art to reduce the overhead associated with offloading of IPSec functions from a processor to a processor such as a peripheral device. In particular, there is a need for solutions for offloading IPSec functions while simultaneously maintaining processing requirements for the individual connections.