1. Field of the Invention
The present invention relates to communication systems, in particular, to data caching and coherency maintenance for an accelerated processor architecture for packet networks.
2. Description of the Related Art
Network processors are generally used for analyzing and processing packet data for routing and switching packets in a variety of applications, such as network surveillance, video transmission, protocol conversion, voice processing, and internet traffic routing. Early types of network processors were based on software-based approaches with general-purpose processors, either singly or in a multi-core implementation, but such software-based approaches are slow. Further, increasing the number of general-purpose processors diminished performance improvements, or actually slowed down overall network processor throughput. Newer designs add hardware accelerators to offload certain tasks from the general-purpose processors, such as encryption/decryption, packet data inspections, and the like. These newer network processor designs are traditionally implemented with either i) a non-pipelined architecture or ii) a fixed-pipeline architecture.
In a typical non-pipelined architecture, general-purpose processors are responsible for each action taken by acceleration functions. A non-pipelined architecture provides great flexibility in that the general-purpose processors can make decisions on a dynamic, packet-by-packet basis, thus providing data packets only to the accelerators or other processors that are required to process each packet. However, significant software overhead is involved in those cases where multiple accelerator actions might occur in sequence.
In a typical fixed-pipeline architecture, packet data flows through the general-purpose processors and/or accelerators in a fixed sequence regardless of whether a particular processor or accelerator is required to process a given packet. This fixed sequence might add significant overhead to packet processing and has limited flexibility to handle new protocols, limiting the advantage provided by using the accelerators. Network processors implemented as a system on chip (SoC) having multiple processing modules might typically classify an incoming packet to determine which of the processing modules will perform operations for the particular packet or flow of packets.
A network processor in a switching network might provide transport of received data packets from an input port to one (unicast) or more (multicast) output ports of the network. Received data packets are provided to one or more output ports according to a scheduling algorithm. Some network switches provide multicasting by replicating received packets at the output port(s) corresponding to the received packet. Multicast packets might be replicated as many times as the number of output ports to which the multicast packet is to be broadcast. Thus, in some network switches, large amounts of packet data are replicated to enable multicasting.