The power, speed and complexity of integrated circuits has improved rapidly in recent years, particularly for such integrated circuits (ICs) as random access memory (RAM) chips, application specific integrated circuit (ASIC) chips, microprocessor (uP) chips, and the like. These improvements have made possible the development of system-on-a-chip (SOC) devices. A SOC device incorporates in a single IC chip many of the components of a complex electronic system, such as a wireless receiver (i.e., cell phone, a television receiver, or the like). The primary advantages of SOC devices are lower costs, greatly decreased size, and reduced power consumption of the system.
One particularly important application of an SOC device is the network processing unit (NPU). With the recent and on-going explosion of low-cost high bandwidth technology, intensive processing tasks and service hosting are moving closer to consumers on the “intelligent edge” of the network, where a significant portion of the future storage, processing and network management will take place. This is particularly true of ultra-high bandwidth fibre communications, which are radically shifting preconceptions about where computation and storage should take place.
In the labs of leading telecom companies, a throughput of 6.4 Terabits/s(6400 Gbit/sec) has been demonstrated using a single fibre strand by means of Wave Division Multiplexing (WDM). The total voice traffic worldwide in 1999 was 10 Terabits/second and the worldwide transoceanic cable capability has grown 1000% between 1999 and 2001. In other words, communication bandwidth and the price of that bandwidth will become much less significant in the near future. This will have a dramatic effect on the complexity and protocols of communication networks. There will be a trend towards much greater simplification and efficiencies through the widespread use of WDM and IP. The last mile to the user will remain a challenge, but this is being addressed progressively by xDSL, cable modems, broadband wireless and satellite links.
Network processing units are proposed to meet the explosive growth in network bandwidth and services. A network processing unit is a highly integrated set of micro-coded or hardwired accelerated engines, memory sub-system, and high speed interconnect and media interfaces to tackle packet processing close to the wire. It uses pipelining, parallelism, and multi-threading to hide latency. It has good data flow management and high-speed internal communications support. It has the ability to access co-processors and is closely coupled with the media interface.
Network processing units present a whole new set of requirements. OC-12 and OC-48 network speeds are becoming common. OC-192 networks, which allow for only 52 ns of processing per packet received, are on the horizon. After that, OC-768 will soon follow, leaving only 13 ns of processing time per packet.
However, it is becoming apparent that traditional SOC devices and processors cannot keep up with the speed and programmability requirements of evolving networks. The Intel IXP 1200 is targeted at LAN-WAN switches operating at OC-48 speeds. The architecture consists of six micro-engines sharing a bus with memory. The micro-engines are managed by a StrongARM core processor. It has a PCI bus to communicate with the host CPU, memory controllers, and a bus interface to network MAC devices. The device operates at 162 MHz. Each micro-engine supports four threads, which helps to eliminate micro-engines waiting for memory resources. Micro-engines have a large register set, consisting of 128 general-purpose registers, along with 128 transfer registers. Shift and ALU operations occur in a single cycle. A hardware hash unit is responsible for the generation of 48 or 64-bit adaptive polynomial hash keys. Multiple IXP 1200 units can be aggregated in serial or parallel.
MMC has developed the AnyFlow 5000 network processor. These have five different stages: ingress processing, switching, queuing, scheduling, and egress processing. Per-flow queuing is used which allows each flow to be queued independently. Other functions handled on a per-flow basis are queuing control and scheduling. MMC also has developed the nP3400, which integrates a programmable packet processor, switch fabric, and multiple Ethernet interfaces on a single chip. It contains two programmable 200-MHz RISC processors and a 4.4 Gb/s switch fabric. It has policy engines supporting 128 rules.
IBM has developed the Rainer NPU. It has sixteen programmable protocol processors and a PowerPC control processor. It has hardware accelerators to perform tree searches, frame forwarding, filtering and alteration. Each processor has a 3-stage pipeline (fetch, decode, execute) and runs at 122 MHz. Each processor has seven coprocessors associated with it, including one for checksum, string copy, and flow information. Hardware accelerators perform frame filtering and alteration and tree searches.
Instruction-set definition, pipelining, parallelism, multithreading, fast interconnect, and semiconductor technology all combine to produce a network processor capable of OC-192 speeds and higher. Speed-up is possible through an enhanced instruction-set which is designed specifically for network-oriented applications. There are specific instructions for field extraction, byte alignment, comparisons, boolean computations, endianess, conditional opcodes used to reduce branches, and more powerful network-specific computational instructions.
The way in which all the packet-processing engines in a network processing unit connect to internal and external resources is crucial. If a data packet processing engine is unable to continue work because it is limited by a slow interconnection network in the network processing unit (NPU), then much of the processing power is wasted. A primary source of delay in the interconnection network in a network processing unit (NPU) and many other system-on-a-chip (SOC) devices is the number of data links that a data packet must traverse to get from a source node to a destination node within the NPU or other SOC device. Unfortunately, eliminating all multiple hop data links by connecting all processing nodes directly to all other processing nodes, such as by means of an N×N crossbar, results in a complex interconnection network that reduces the speed of data transfers due to the physical length of the interconnections and interference between the interconnections.
Therefore, there is a need in the art for an improved interconnection architecture for system-on-a-chip (SOC) devices and other large scale integrated circuits. In particular, there is a need for an interconnection architecture that minimizes the delay in transferring data between processing nodes in an SOC device, such as a network processing unit. More particularly, there is a need for an interconnection architecture that minimizes the number of hops (or data transfers) between processing nodes in an SOC device, such as a network processing unit.