The present invention relates to computer networks and, in particular, to a programmable arrayed processing engine architecture of a network switch.
Computer architecture generally defines the functional operation, including the flow of information and control, among individual hardware units of a computer. One such hardware unit is the processor or xe2x80x9cprocessing enginexe2x80x9d which contains arithmetic and logic processing circuits organized as a set of data paths. In some implementations, the data path circuits may be configured as a central processing unit (CPU) having operations which are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the CPU.
A high-performance computer may be realized by using a number of identical CPUs or processors to perform certain tasks in parallel. For a purely parallel multiprocessor architecture, each processor may have shared or private access to non-transient data, such as program instructions (e.g., algorithms) stored in a memory coupled to the processor. Access to an external memory is generally inefficient because the execution capability of each processor is substantially faster than its external interface capability; as a result, the processor often idles while waiting for the accessed data. Moreover, scheduling of external accesses to a shared memory is cumbersome because the processors may be executing different portions of the program. On the other hand, providing each processor with private access to the entire program results in inefficient use of its internal instruction memory.
In an alternative implementation, the data paths may be configured as a pipeline having a plurality of processor stages. This configuration conserves internal memory space since each processor executes only a small portion of the program algorithm. A drawback, however, is the difficulty in apportioning the algorithm into many different stages of equivalent duration. Another drawback of the typical pipeline is the overhead incurred in transferring transient xe2x80x9ccontextxe2x80x9d data from one processor to the next in a high-bandwidth application.
One example of such a high-bandwith application involves the area of data communications and, in particular, the use of a parallel, multiprocessor architecture as the processing engine for an intermediate network station. The intermediate station interconnects communication links and subnetworks of a computer network to enable the exchange of data between two or more software entities executing on hardware platforms, such as end stations. The stations typically communicate by exchanging discrete packets or frames of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP), the Internet Packet Exchange (IPX) protocol, the AppleTalk protocol or the DECNet protocol . In this context, a protocol consists of a set of rules defining how the stations interact with each other.
A router is an intermediate station that implements network services such as route processing, path determination and path switching functions. The route processing function determines the type of routing needed for a packet, whereas the path switching function allows a router to accept a frame on one interface and forward it on a second interface. The path determination, or forwarding decision, function selects the most appropriate interface for forwarding the frame. A switch is also an intermediate station that provides the basic functions of a bridge including filtering of data traffic by medium access control (MAC) address, xe2x80x9clearningxe2x80x9d of a MAC address based upon a source MAC address of a frame and forwarding of the frame based upon a destination MAC address. Modem switches further provide the path switching and forwarding decision capabilities of a router. Each station includes high-speed media interfaces for a wide range of communication links and subnetworks.
The hardware and software components of these stations generally comprise a communications network and their interconnections are defined by an underlying architecture. Modem communications network architectures are typically organized as a series of hardware and software levels or xe2x80x9clayersxe2x80x9d within each station. These layers interact to format data for transfer between, e.g., a source station and a destination station communicating over the internetwork. Predetermined services are performed on the data as it passes through each layer and the layers communicate with each other by means of the predefined protocols. Examples of communications architectures include the IPX communications architecture and, as described below, the Internet communications architecture.
The Internet architecture is represented by four layers which are termed, in ascending interfacing order, the network interface, internetwork, transport and application layers. These layers are arranged to form a protocol stack in each communicating station of the network. The lower layers of the stack provide internetwork services and the upper layers collectively provide common network application services. For example, the network interface layer comprises physical and data link sublayers that define a flexible network architecture oriented to the implementation of local area networks (LANs). Specifically, the physical layer is concerned with the actual transmission of signals across the communication medium and defines the types of cabling, plugs and connectors used in connection with the medium. The data link layer (xe2x80x9clayer 2xe2x80x9d) is responsible for transmission of data from one station to another and may be further divided into two sublayers: logical link control (LLC) and MAC sublayers.
The MAC sublayer is primarily concerned with controlling access to the transmission medium in an orderly manner and, to that end, defines procedures by which the stations must abide in order to share the medium. In order for multiple stations to share the same medium and still uniquely identify each other, the MAC sublayer defines a hardware or data link MAC address. This MAC address is unique for each station interfacing to a LAN. The LLC sublayer manages communications between devices over a single link of the internetwork.
The primary network layer protocol of the Internet architecture is the Internet protocol (IP) contained within the internetwork layer (xe2x80x9clayer 3xe2x80x9d). IP is a network protocol that provides internetwork routing and relies on transport protocols for end-to-end reliability. An example of such a transport protocol is the Transmission Control Protocol (TCP) contained within the transport layer. The term TCP/IP is commonly used to refer to the Internet architecture. Protocol stacks and the TCP/IP reference model are wellknown and are, for example, described in Computer Networks by Andrew S. Tanenbaum, printed by Prentice Hall PTR, Upper Saddle River, N.J. 1996.
Data transmission over the network therefore consists of generating data in, e.g., a sending process executing on the source station, passing that data to the application layer and down through the layers of the protocol stack where the data are sequentially formatted as a frame for delivery over the medium as bits. Those frame bits are then transmitted over the medium to a protocol stack of the destination station where they are passed up that stack to a receiving process. Although actual data transmission occurs vertically through the stacks, each layer is programmed as though such transmission were horizontal. That is, each layer in the source station is programmed to transmit data to its corresponding layer in the destination station. To achieve this effect, each layer of the protocol stack in the source station typically adds information (in the form of a header) to the data generated by the sending process as the data descends the stack.
For example, the internetwork layer encapsulates data presented to it by the transport layer within a packet having a network layer header. The network layer header contains, among other information, source and destination network addresses needed to complete the data transfer. The data link layer, in turn, encapsulates the packet in a frame, such as a conventional Ethernet frame, that includes a data link layer header containing information, such as MAC addresses, required to complete the data link functions. At the destination station, these encapsulated headers are stripped off one-by-one as the frame propagates up the layers of the stack until it arrives at the receiving process.
Increases in the frame/packet transfer speed of an intermediate station are typically achieved through hardware enhancements for implementing well-defined algorithms, such as bridging, switching and routing algorithms associated with the predefined protocols. Hardware implementation of such an algorithm is typically faster than software because operations can execute in parallel more efficiently. In contrast, software implementation of the algorithm on a general-purpose processor generally performs the tasks sequentially because there is only one execution path. Parallel processing of conventional data communications algorithms is not easily implemented with such a processor, so hardware processing engines are typically developed and implemented in application specific integrated circuits (ASIC) to perform various tasks of an operation at the same time. These ASIC solutions, which are generally registers and combinational logic configured as sequential logic circuits or state machines, distinguish themselves by speed and the incorporation of additional requirements beyond those of the basic algorithm functions. However, the development process for such an engine is time consuming and expensive and, if the requirements change, inefficient since a typical solution to a changing requirement is to develop a new ASIC.
Another approach to realizing a high-performance, high-bandwidth network processing engine involves the use of specialized switching hardware to perform a subset of the network functions with the remaining functions executed in software. Examples of such hybrid processing engines are those included in the 7000 and 7500 family of routers manufactured by Cisco Systems, Inc of San Jose, Calif. The 7000 processing engine comprises a hierarchy of three processors: an interface processor (IP) which handles maintenance of interfaces to external media, a switching processor (SP) that performs switching functions for the router and a routing processor (RP) that is responsible for administration of routing databases. The RP is typically a general-purpose processor that executes a realtime operating system in tandem with the SP, which is a programmable hardware engine optimized for high-performance operations. Instead of using two processors to split tasks directed to information in shared memory, the 7500 series of routers combines the RP and SP into a single general-purpose routing switch processor.
The single, general-purpose processor is generally not fast enough to perform layer 2 or 3 switching operations of frames/packets at line rates (e.g., OC12, OC48 or OC192) of the station""s high-speed media interfaces. This is primarily because the bandwidth of the Internet is growing exponentially and significantly faster than the performance capabilities of currently-available data communications equipment. Use of a separate processor for each interface introduces data coherency issues with respect to, e.g., offloading routing tables to each of the interfaces. Solutions to these coherency issues, including updates to the tables, are time consuming and expensive.
Thus, an object of the present invention is to provide a processor architecture that approaches the speed of an ASIC solution but with the flexibility of a general-purpose processor.
Another object of the present invention is to provide a processing engine for an intermediate network station that efficiently executes conventional network service algorithms.
Still another object of the present invention is to provide a processing engine of an intermediate network station capable of processing frames/packets at the line rate of high-speed media interfaces.
The present invention relates to a programmable arrayed processing engine for efficiently processing transient data within an intermediate network station of a computer network. The engine generally comprises an array of processing elements embedded among input and output buffer units with a plurality of interfaces from the array to an external memory. The external memory stores non-transient data organized within data structures, such as forwarding and routing tables, for use in processing the transient data. Each processing element contains an instruction memory that allows programming of the array to process the transient data as stages of baseline or extended pipelines operating in parallel.
In the illustrative embodiment, the processing elements are symmetrically arrayed as rows and columns. That is, the processing elements of each row are configured as stages of a pipeline that sequentially execute operations on the transient data, whereas the processing elements of each column operate in parallel to perform substantially the same operation on that data, but with a shifted phase. Specifically, the processing elements of each row are connected by a data path that serially passes data and control xe2x80x9ccontextxe2x80x9d among the stages of the pipelines. This arrangement enables data processing to occur as a series of high-level pipelines that sequentially execute operations on the transient data.
Because they perform similar functions, the columned processing elements require similar non-transient xe2x80x9ctablexe2x80x9d data. Therefore in accordance with an aspect of the invention, the external memory is partitioned into a plurality of memory resources, each of which is dedicated to a respective column of processing elements for storing only a particular type of table data. Partitioning of the external memory so that each processing element stage of a pipeline has exclusive access to a dedicated memory resource allows the arrayed processing engine to satisfy high bandwidth requirements of the station.
Advantageously, the invention provides a superscalar/superpipelined processing engine architecture that delivers high-performance, parallel processing functions that approach the speed of a complete hardware solution, but with valuable flexibility. That is, the inventive architecture advantageously allows programming of each processing element stage of the arrayed processing engine which, in turn, enables operations on different algorithms and applications. The programmable nature of the elements also facilitates changes in the operations performed by each stage.