1. Field of the Invention
The present invention is generally related to computer network infrastructure components, such as network gateways, that implement scalable high-speed architectures, and, in particular, to a scalable network gateway processor optimally implementing wire-speed compute intensive processing operations on network data packets that uses an efficient internal load balancing system.
2. Description of the Related Art
With the continued growth of the Internet and proliferation of private distributed intranets, increasing the speed, security, and transactional reliability of network data transmissions remains a fundamental concern and continuing consideration in the development of new network infrastructure. The demands on the growth of the Internet, particularly in terms of speed, have been even more dramatic. Network speed requirements even several tiers from the Internet backbone are rapidly exceeding one gigabit per second (Gbps) and likely to jump to four, ten Gbps, and even greater speeds in the very near future. Very high-speed infrastructure components are therefore widely needed in the broad construction of the Internet and connected private distributed intranets.
Much of this demand for increased network speed, security, and reliability is driven by the very real efficiencies that can be obtained by extending complex services and capabilities to remote network locations and between private distributed intranets. In most cases, maximizing these efficiencies requires that the network infrastructure connect remote locations and private distributed intranets at wire speed—the maximum fundamental speed of the network connecting any two sites. Network traffic switches and routers are conventionally designed to operate at wire-speeds. There are, however, many network functions that, as conventionally implemented, operate at only a fraction of current third tier wire-speeds. Network components implementing these functions therefore necessarily impose significant bottlenecks in the network traffic between remote locations and distributed private intranets.
Network components conventionally recognized as creating bandwidth limitations are characteristically required to perform compute intensive operations. In essence, such network components must limit the rate of new data packets being received in order not to overwhelm the buffering capacity of the network component while the compute intensive function is being performed. Even with substantial buffering, the inability to timely process received data packets results in an overall bandwidth limitation that reduces throughput to a small fraction of the wire-speed of the connected infrastructure. The provision of such buffering, however, also raises problems ensuring security over the buffered data and transactional reliability through the buffer.
Examples of compute intensive network components include virtual private network (VPN) and secure sockets layer (SSL) components and components providing packet protocol conversions, such as between fiber channel and iSCSI protocols. Conventional VPN components are used to establish secure virtual private connections over the public Internet between distributed locations. Security for such VPN network transmissions over the Internet is typically implemented using secure internet protocols, such as the IETF established IPsec protocols. The in-band encryption protocols of IPsec provide for the encryption of Internet routed packet data, enabling point-to-point secure delivery of Ethernet transported data. In many circumstances, such as typified by corporate intranet environments, local network traffic requirements may easily aggregate to levels requiring gigabit Ethernet VPN connections between distributed locations. While software-only solutions are possible, isolation of the compute intensive data encryption and decryption services of IPsec on a hardware-based accelerator is conventionally recognized as necessary to support bandwidths that are any significant fraction of a gigabit Ethernet connection.
The SSL protocol similarly involves in-band encryption and decryption of significant volumes of network traffic. Although the SSL protocol is implemented as a presentation level service, which allows applications to selectively use the protocol, Internet sites typically concentrate SSL connections in order to manage repeated transactions between specific clients and servers to effect the appearance of a state-full connection. As a result, network traffic loads can easily aggregate again to substantial fractions of a gigabit Ethernet connection. SSL accelerator network components are therefore needed to implement hardware-based encryption and decryption services, as well as related management functions, where the network traffic is any significant fraction of a gigabit Ethernet connection.
Unfortunately, conventional network components capable of any significant in-band compute intensive processing of high-throughput rate packet data are incapable of achieving gigabit wire-speed performance. Typically, a peripheral accelerator architecture, such as described in U.S. Pat. No. 6,157,955, is utilized to perform the compute-intensive functions. Such architectures generally rely on a bus-connected peripheral array of dedicated protocol processors to receive, perform the in-band data processing, and retransmit data packets. Each protocol processor includes a hardware encryptor/decryptor unit, local ingress and egress Ethernet interfaces and a bridging interface, operable through the peripheral bus. Conventionally, each peripheral protocol processor may be capable of performing on the order of 100 megabits of total throughput. The bridging interface is therefore necessary to aggregate the function of the peripheral array. Thus, while significant peak accelerations can be achieved for data packets both received and retransmitted through the local Ethernet interfaces of a single protocol processor, the aggregate array performance is actually limited by the performance of the shared peripheral bus interconnecting the array. High-speed peripheral interconnect buses, such as the conventional PCI bus, are limited to a theoretical maximum throughput of about 4 Gbps. With the necessary effects of bus contention and management overhead, and multiple bus transactions to transport a single data packet, the actual bridged data transfer of even just four peripheral processors can effectively saturate the peripheral bus. Consequently, the aggregate throughput of such peripheral arrays conventionally fall well below one Gbps and run more typically in the range of 250 to 400 Mbps. Such rates clearly fail to qualify as wire-speed in current network infrastructures.
Consequently, there is a need for an efficiently scalable system and architecture capable of performing compute intensive data packet processing at wire-speeds in excess of one Gbps and readily extendable to speeds of 4 and 10 Gbps and beyond.