Legacy packet processors complete all processing tasks on a first packet data before starting processing a second packet data. Conventional pipelined packet processors have streamlined packet processing relative to legacy packet processors by allowing processing to begin on the second packet data before completing processing on the first packet data. Such conventional pipelining is generally accomplished by implementing different sets of processing tasks at different stages of a pipeline that execute independently, concurrently processing different packet data at the different stages.
One problem with the described conventional pipelined packet processors is that the stages are blocking. Specifically, the speed through the processing pipeline is only as fast as the slowest packet in the chain. A packet that is completed early at a current stage must generally wait to proceed to a next stage if the next stage is busy processing another frame. Thus, the processing capabilities of the current stage may become underutilized while awaiting for the next stage to complete processing.
Another problem with existing processor technology is the execution of conditional branch instructions typically carried out during packet processing at each stage. Conditional branch instructions take the form of “if <condition> then <action>.” Determining whether a branch condition is true or not typically requires several processor cycles during which information is fetched, decoded, executed and written. Because the next conditional branch instruction for execution in a series is dependent upon the previous branch condition result, existing processors have either waited several processor cycles for the actual result to be returned, or have continued processing based on a predicted result. Both of these solutions can result in severe timing penalties. Waiting for the actual result can substantially slow-down processing, while proceeding based on a predicted result can lead to loading of incorrect instructions that may later have to be flushed.
Accordingly, there is a need for a packet processor with improved throughput and processing efficiency. The processing capabilities of such a processor should not be underutilized while waiting for a next stage to become available and/or waiting for branch condition results. At the same time, such a processor should not be vulnerable to the risks of branch misprediction.