Handling of packets in a network is similar to execution of a program. In a computer program, the next instruction that is to be executed may depend on the results of the execution of the current instruction. For example, a conditional branch instruction may cause one instruction to be executed if a condition is equal to a first value, while a different instruction may be executed if the condition is equal to a second value. Similarly, the destination of a packet may be one location if a field of the packet is a first value, or may be another location if the field of the packet is a second value. A long sequence of such dependent tests may be required to determine the destination of a packet. Consequently, network packet handling is branch dependent.
Thus, network packet handling, including network packet parsing, can be classified as “branch intensive,” meaning that the time to solution for that task is dominated by the speed with which a required sequence of conditional branches can be executed. Most processors experience at least some penalty each time a conditional branch is executed, due to the fact that certain operations that might otherwise be overlapped cannot achieve that parallelism when a branch is taken. For example, a processor may typically perform the logical and arithmetic operations required for one instruction while fetching the next instruction from memory. This overlap of functions is not possible if the address used to fetch the next instruction is determined by the logical and arithmetic functions performed by the immediately previous instruction. For branch intensive tasks, the efficiency of execution is limited by this “branch penalty.”
Some processors attempt to overcome this branch penalty by scheduling branches in advance wherever possible. For example, instructions that are not dependent upon the outcome of the branch are inserted between the instruction that determines the branch target direction and the first instruction at that target. However, branch intensive tasks are not well served by this approach, because they typically do not have sufficient work to fill in the gaps in execution created by the branches.
Other processors attempt to predict which direction a conditional branch will take and fetch the instruction for the predicted direction before that prediction can be confirmed. If the prediction is correct, the penalty is avoided. If the prediction is incorrect, the penalty is still incurred. Moreover, sometimes the penalty is larger for a misprediction due to the need to back out of the wrong path. Branch prediction relies on the fact that certain conditional branches, like loop termination branches, are much more likely to take one direction than the other. Tasks, like network packet parsing, with many branches for which both branch directions are often of nearly equally likelihood are not well served by branch prediction.
In early systems, two levels of code were used to operate the system. A program would be written in machine language which is then executed by the computer by executing a separate microcode program for each instruction. These types of systems used wide instruction words to allow for parallel processing and explicit control of branches. These systems also had a writeable control store to allow a programmer to create his or her own routines in microcode for faster processing. There is a certain inefficiency in linking all the separate machine language instructions together that could be overcome by bypassing the machine language entirely and writing an entire function in microcode. Some of these systems may operate to specify two different next addresses and then perform a test, choose one of those addresses, and then fetch the next word. In these systems, the technology was such that cycles were long enough that this method gave a reasonably high performance.