The present invention relates generally to a programmable arrayed processing engine of a network switch and more particularly, to a method and apparatus for debugging failures of processors within a programmable arrayed processing engine.
Computer architecture generally defines the functional operation, including the flow of information and control, among individual hardware units of a computer. One such hardware unit is a processing engine that contains arithmetic and logic processing circuits organized as a set of data paths. In some implementations, the data path circuits may be configured as a processor having operations that are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the processor. When implementing these functions, the processor generally processes xe2x80x9ctransientxe2x80x9d data residing in a data memory in accordance with the instructions.
A high-performance processing engine may be realized by using a number of identical processors to perform certain tasks in parallel. For a purely parallel multiprocessor architecture, each processor may have shared or private access to non-transient data, such as program instructions (e.g., algorithms) stored in a memory coupled to the processor. Access to an external memory is generally inefficient because the execution capability of each processor is substantially faster than its external interface capability; as a result, the processor often idles while waiting for the accessed data. Moreover, scheduling of external accesses to a shared memory is cumbersome because the processors may be executing different portions of the program.
In an alternative implementation, the data paths may be configured as a pipeline having a plurality of processor stages. This configuration conserves internal memory space since each processor executes only a small portion of the program algorithm. A drawback, however, is the difficulty in apportioning the algorithm into many different stages of equivalent duration. Another drawback of the typical pipeline is the overhead incurred in transferring transient xe2x80x9ccontextxe2x80x9d data from one processor to the next in a high-band width application.
One example of such a high-bandwith application involves the area of data communications and, in particular, the use of a parallel, multiprocessor architecture as the processing engine for an intermediate network station. The intermediate station interconnects communication links and subnetworks of a computer network to enable the exchange of data between two or more software entities executing on hardware platforms, such as end stations. The stations typically communicate by exchanging discrete packets or frames of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP), the Internet Packet Exchange (IPX) protocol, the AppleTalk protocol or the DECNet protocol. In this context, a protocol consists of a set of rules defining how the stations interact with each other.
A router is an intermediate station that implements network services such as route processing, path determination and path switching functions. The route processing function determines the type of routing needed for a packet, whereas the path switching function allows a router to accept a frame on one interface and forward it on a second interface. The path determination, or forwarding decision, function selects the most appropriate interface for forwarding the frame. A switch is also an intermediate station that provides the basic functions of a bridge including filtering of data traffic by medium access control (MAC) address, xe2x80x9clearningxe2x80x9d of a MAC address based upon a source MAC address of a frame and forwarding of the frame based upon a destination MAC address. Modern switches further provide the path switching and forwarding decision capabilities of a router. Each station includes high-speed media interfaces for a wide range of communication links and subnetworks.
Increases in the frame/packet transfer speed of an intermediate station are typically achieved through hardware enhancements for implementing well-defined algorithms, such as bridging, switching and routing algorithms associated with the predefined protocols. Hardware implementation of such an algorithm is typically faster than software because operations can execute in parallel more efficiently. In contrast, software implementation of the algorithm on a general-purpose processor generally performs the tasks sequentially because there is only one execution path. Parallel processing of conventional data communications algorithms is not easily implemented with such a processor, SO hardware processing engines are typically developed and implemented in application specific integrated circuits (ASIC) to perform various tasks of an operation at the same time. These ASIC solutions distinguish themselves by speed and the incorporation of additional requirements beyond those of the basic algorithm functions. However, the development process for such an engine is time consuming and expensive and, if the requirements change, inefficient since a typical solution to a changing requirement is to develop a new ASIC.
Such an ASIC solution may comprise an arrayed processing engine having a plurality of processor pipelines. Each element of the processor pipeline comprises a processor complex that includes, among other things, an instruction random access memory (IRAM) for storing executable program code routines and a central processing unit (CPU) that is programmable with respect to execution of the code. Each processor complex of a pipeline performs different processing on (packet) data propagating through various xe2x80x9cstagesxe2x80x9d of the pipeline in accordance with a programmed code segment or routine. A code entry point for a particular routine is provided by an upstream CPU of each processor complex for each downstream CPU in the pipeline, thereby rendering the program code executed by each processor dependent on other processors in the engine.
Because of the size and complexity of such a highly integrated ASIC, it is rather difficult to build entirely functioning processor complexes, especially in the early yield learning of advanced semiconductor processes. As a result, a processor complex of a pipeline may fail during production of the ASIC causing failure of the entire pipeline because data is unable to be passed among the processor complexes of the pipeline. Since the code executed by a downstream processor complex is dependent upon the xe2x80x9cworkxe2x80x9d previously performed by an upstream processor complex, a software developer that is developing code for the downstream processor of a pipeline depends upon and expects certain operations to have been performed in order to provide the correct scenario for the code. Failure of an upstream processor complex may impact such program code development.
Data bypassing capabilities are generally not required for processor stages of a conventional pipeline processor because each processor stage is typically xe2x80x9chardware assistedxe2x80x9d in that there are specific circuits associated with the function performed by the stage on data passing through the pipeline. Therefore, a subsequent processor stage generally cannot be programmed to perform the function of a previous stage, completion of which is typically required prior to performance of the subsequent stage function.
Therefore, an object of the present invention is to provide a mechanism for isolating a processor complex of an arrayed processing engine.
Another object of the invention is to provide a mechanism for supplying an independent code entry point for a programmable processor of an isolated processor complex.
Yet another object of the present invention is to provide a mechanism for advancing code execution of a processor complex within a pipeline of the arrayed processing engine having an isolated processor complex without running code on the isolated processor.
Thee present invention provides a processor isolation technique for enhancing debug capability in a highly integrated multiprocessor circuit containing a programmable arrayed processing engine for efficiently processing transient data within an intermediate network station of a computer network. The processing engine generally comprises an array of processor complex elements embedded among input and output buffer units. Each processor complex comprises a microcontroller (TMC) core coupled to an instruction memory and a memory manager circuit. The instruction memory allows, inter alia, programming of the array to process the transient data as stages of baseline or extended pipelines operating in parallel.
In the illustrative embodiment, the processor complexes are arrayed as rows and columns. That is, the processor complexes of each row are configured as stages of a pipeline that sequentially execute operations on the transient data, whereas the processor complexes of each column operate in parallel to perform substantially the same operation on that data, but with a shifted phase. The processor complexes of each row are connected by a data path that serially passes data and control xe2x80x9ccontextxe2x80x9d among the stages of the pipelines. This arrangement enables data processing to occur as a series of high-level pipelines that sequentially execute operations on the transient data.
In an aspect of the inventive isolation technique, a mechanism is provided for programming a code entry point for each TMC utilizing a register set that is accessible via an out-of-band bus coupled to a remote processor (RP) of the processing engine. This programmable entry point mechanism provides the flexibility of programming a TMC of a particular processor complex for code execution notwithstanding the states of other processor complexes in the pipeline. According to the invention, the programmable entry point mechanism may operate in conjunction with a bypass capability that passes transient data through a processor complex that is not functional, not running or otherwise unable to process data. Another aspect of the debug technique involves the ability to override completion control signals provided by each processor complex in order to advance a pipeline of the processing engine. This latter aspect of the invention involves a pipeline advancement mechanism that is programmable via the out-of-band RP bus and device.
In accordance with the present invention, the pipeline advancement and entry point mechanisms comprise programmable control circuitry contained within the input buffer unit and each processor complex of the processing engine. Specifically, a first circuit comprising a programmable register coupled to logic circuitry enables overriding of processor complex completion signals to advance execution of a pipeline in the event of failure of a processor complex within the pipeline. In addition, a second circuit comprising a programmable register set and associated multiplexing circuitry allows real-time control over the code entry point for each TMC of a processor complex, independent of code executing in other TMCs. A bypass feature of the memory manager circuit further allows data to flow through an isolated processor complex without requiring operation of the TMC within that processor complex.
Advantageously, the novel isolation technique allows easier, faster debug of complex multiprocessor linked code. In addition, the inventive debug capability allows use of a highly integrated circuit having a plurality of processor pipelines despite the presence of certain defects within stages of the pipeline, such as non-functioning instruction memories and TMCs. The enhanced debug capability described herein enables isolation of a single processor complex, a column of the processor complexes or a row of processor complexes in the arrayed processor engine.