1. Field of the Invention
The present invention relates to communication systems, in particular, to an accelerated processor architecture for network communications.
2. Description of the Related Art
Network processors are generally used for analyzing and processing packet data for routing and switching packets in a variety of applications, such as network surveillance, video transmission, protocol conversion, voice processing, and internet traffic routing. Early types of network processors were based on software-based approaches with general-purpose processors, either singly or in a multi-core implementation, but such software-based approaches are slow. Further, increasing the number of general-purpose processors had diminishing performance improvements, or might actually slow down overall network processor throughput. Newer designs add hardware accelerators to offload certain tasks from the general-purpose processors, such as encryption/decryption, packet data inspections, and the like. These newer network processor designs are traditionally implemented with either i) a non-pipelined architecture or ii) a fixed pipeline architecture.
In a typical non-pipelined architecture, general-purpose processors are responsible for each action taken by acceleration functions. A non-pipelined architecture provides great flexibility in that the general-purpose processors can make decisions on a dynamic, packet-by-packet basis, thus providing data packets only to the accelerators or other processors that are required to process each packet. However, significant software overhead is involved in those cases where multiple accelerator actions might occur in sequence.
In a typical fixed-pipeline architecture, packet data flows through the general-purpose processors and/or accelerators in a fixed sequence regardless of whether a particular processor or accelerator is required to process a given packet. This fixed sequence might add significant overhead to packet processing and has limited flexibility to handle new protocols, limiting the advantage provided by the using accelerators.
A network processor that generates output packets (“reassemblies”) might typically store reassembly data in a shared memory due to the potentially large size of the reassemblies. A network processor might be implemented as a system on chip (SoC) having multiple processing modules that might concurrently access the shared memory. The overall packet throughput of the network processor therefore might depend in part on the efficiency of each processing module's interface to the shared memory. A typical shared memory might require some setup time (one or more clock cycle(s)) before each data transfer, during which no data is transferred. Further, the data interface to a shared memory might typically be many bytes wide, where one or more bytes are actually valid data.
Typical interfaces to shared memory might write to the system memory as soon as data is available from an input packet. This might result in wasted clock cycles accessing the shared memory since one or more setup cycles are required for each write transfer. Other typical interfaces to shared memory might wait for all the data of a single reassembly to be available, and then write all the data to the shared memory. This might reduce the number of setup cycles when input packets are long, but would still require many setup cycles if there are many small input packets. If there are many small input packets the memory efficiency might also suffer because for many write cycles less than the entire width of the memory interface is valid data. Further, a large amount of local storage might be required to hold the data for a single large input packet.
In general, a network processor requires some finite amount of time to process a given input packet. While the network processor is processing a given input packet, additional input packets might be received and might typically be stored in an input queue. If the rate at which input packets are received is greater than the rate at which the network processor can process the input packets, the input queue might become full and, thus, be unable to store any additional input packets. When the input queue becomes full, the network processor might drop one or more input packets.
Dropping input packets could result in the corresponding reassembly missing data, or including incorrect data. The network processor might eventually transmit the reassembly as an output packet, even though the data of the reassembly is incorrect due to the input packet being dropped. Further, there might not be any indication of this corruption to the destination device that receives the output packet. Alternatively, dropping input packets might also result in a given reassembly not being transmitted, if the dropped input packet is the final packet of the reassembly.