Network processors are generally used for analyzing and processing packet data for routing and switching packets in a variety of applications, such as network surveillance, video transmission, protocol conversion, voice processing, and internet traffic routing. Early types of network processors were based on software-based approaches with general-purpose processors, either singly or in a multi-core implementation, but such software-based approaches are slow. Further, increasing the number of general-purpose processors had diminishing performance improvements, or might actually slow down overall network processor throughput. Newer designs add hardware accelerators in a system on chip (SoC) architecture to offload certain tasks from the general-purpose processors, such as encryption/decryption, packet data inspections, and the like. These newer network processor designs are traditionally implemented with either i) a non-pipelined SoC architecture or ii) a fixed pipeline SoC architecture.
In a typical non-pipelined SoC architecture, general-purpose processors are responsible for each action taken by acceleration functions. A non-pipelined SoC architecture provides great flexibility in that the general-purpose processors can make decisions on a dynamic, packet-by-packet basis, thus providing data packets only to the accelerators or other processors that are required to process each packet. However, significant software overhead is involved in those cases where multiple accelerator actions might occur in sequence.
In a typical fixed-pipeline SoC architecture, packet data flows through the general-purpose processors and/or accelerators in a fixed sequence regardless of whether a particular processor or accelerator is required to process a given packet. For example, in a fixed sequence, a single accelerator within the fixed pipeline cannot be employed without employing the entire fixed pipeline. This fixed sequence might add significant overhead to packet processing and has limited flexibility to handle new protocols, limiting the advantage provided by using the accelerators.
Network processors implemented as an SoC having multiple processing modules might typically employ one or more hardware accelerators to implement well defined procedures to improve the efficiency and performance of the SoC. One or more flexible or “control” points in the system might be implemented using one or more programmable processors. The one or more control processors make function calls to the one or more hardware accelerators to perform data operations for a given job. Each of these function calls require a given amount of time to complete the data operation. If a given hardware accelerator is busy, the control processor might desirably process data or function calls for another job. When switching between jobs, the control processor might typically transfer a state of the previous job from one or more registers of the processor to a memory, and then begin processing the next job. When the busy hardware accelerator completes its operation, the control processor might switch back to the previous job by retrieving the state data from the memory and restoring the state data to the registers. Typical SoC's have numerous processors and, thus, large amounts of processor state data that might need to be written to, and read from, the memory to switch between jobs. As this amount of data increases, the control processors might suffer a loss in performance to switch between jobs. Thus, an improved system for integrating one or more general purpose processors and one or more hardware acceleration engines is needed.