1. Field of the Invention
The present invention relates in general to the elimination of packet and control memory bottlenecks in communication networks and more specifically to a Network Processor (NP) architecture. The architecture and algorithm incorporated in the Network Processor Integrated Circuit (IC) of the invention provide orders of magnitude faster performance and the scalability needed to meet the/explosively-increasing demand for bandwidth.
2. Background Information
Almost all communications equipment uses one or more network processors. Communications equipment includes but is not limited to: high-speed routers, switches, intelligent optical devices, DLSAM, broadband access devices, voice gateways, etc. The equipment may deploy the NP in a centralized or distributed manner. Distributed NP is popular for high speed and intelligent communications equipment. For lower and mid-range equipment, centralized NP is very attractive since this will keep the equipment price very low. In complex, high-speed intelligent broadband equipment, the NPs (such as those manufactured by Intel or Lucent) are distributed and each line card may contain one or more NPs. FIG. 1 illustrates the NP physical location within the line card and logical functions within the Networking stacks.
In a typical line card, the fiber-optic cable is connected to the optical module. The other end of the fiber optic line typically connects to an external router or another communications device. Among other functions, the optical module converts the optical signal into an electrical signal. The optical module presents the electrical signal to the framer. The framer performs functions such as: framing, error checking and statistical gathering. The framer provides the framed information to the optional classifier. The classifier performs a flow classification function. The classifier is an optional function. Most equipment does not require classification beyond layer three or four and most network processors perform at least up to layer three or four. The network processor processes the information and forwards it into the appropriate line card within the system's backplane using the switch fabric. Logically, the optical module and framer perform layer one of the OSI stack, whereas the NP and optional classifier handles layers 2 through 7. Processing intelligence, power and bandwidth capacity are the biggest differentiation factors between Network Processors.
Among the single biggest limiting factor for NPs to scale and meet increasing Internet bandwidth demand is Moore's law. Moore's law limits the advancement in semiconductor process technology to 18 months in order to achieve a 100% performance improvement. Doubling every 18 months is far below the Internet bandwidth demand, which doubles every four to six months. As of today, early generation network processors cannot scale by 4 or 16 within a two to three year time window. Overcoming Moore's law is a non-trivial process. FIG. 2 illustrates Moore's law versus Bandwidth demand curve.
The current techniques in network processor architectures are bounded by Moore's law. In general there are three approaches to NP architectures: Multiple RISC, Configurable hardware and a mixture of RISC and hardware. The RISC Architecture and Instruction Set was created decades ago for devices geared toward human to machine interaction. Network devices are not human to machine devices. They are machine-to-machine devices. In other words, they are communicating to high-speed machines and not to humans. Multiple RISC engines within the data path of networking equipment will not meet the required bandwidth demand. Moore's law is one limiting factor. Another severe limiting factor is the complexity of the software compiler, scheduler and/or kernel to efficiently control and maximize the processor's operation. Creating a mini operating system is not the solution to the explosive demand in bandwidth, especially when Moore's law (hardware) cannot even meet the demand.
Configurable hardware results in the highest-performance processors. The simple software interface avoids any performance degradation. Eliminating any software within the information path and replacing it with configurable gates and transistors significantly boosts the performance of the Network Processor. At the gate level, without any creativity within the architecture, Moore's law still bounds the performance advancement of Network Processor architecture.
A mixture of multiple RISCs and configurable hardware machines has two different flavors. The first flavor uses the RISCs in the data path and the other one is to have the RISC processor in the control path. Traditionally, RISC processors in the control path have been limited to those external to the NP.
In addition to the processing capability of the Network Processor, another critical bottleneck in the Network Processor architecture is the memory throughput for the payload buffer. Memory technology advancement is also bounded by Moore's law. Today's generation of store and forward network processors use a single hierarchy memory organization. Bandwidth may be increased by increasing the width of the memory bus. Increasing the information width of the packet memory bus, however, only decreases the actual memory throughput for packet sizes smaller than the bus width because of the additional processing overhead.
FIG. 3 illustrates a typical memory hierarchy within a computer system using either a RISC or CISC CPU.
Due to the principle of locality, the linear multilevel memory hierarchical scheme of FIG. 3 works very well in a CPU architecture. The CPU contains very high speed registers for immediate access. These registers are high-speed memory internal to the CPU providing the CPU with very high-speed single cycle access to the information. The cache is a small piece of memory and has a slightly slower access time compared to the registers. As the memory moves away from the CPU, the storage capacity increases and the access time decreases.
Caching theory works well in the computer architecture, but, unfortunately, due to the non-deterministic nature of network traffic, caching does not work well for Network Processors. The principle of locality does not apply in networking.
Therefore, it is desirable to have a system and method to efficiently access a memory unit while processing network traffic.