A digital signal computer, or digital signal processor (DSP), is a special purpose computer that is designed to optimize performance for digital signal processing applications, such as, for example, fast Fourier transforms, digital filters, image processing, signal processing in wireless systems, and speech recognition. Digital signal processor applications are typically characterized by real-time operation, high interrupt rates and intensive numeric computations. In addition, digital signal processor applications tend to be intensive in memory access operations and to require the input and output of large quantities of data. Digital signal processor architectures are typically optimized for performing such computations efficiently. In addition to digital signal processor applications, DSPs are frequently required to perform microcontroller operations. Microcontroller operations involve the handling of data but typically do not require extensive computation.
Digital signal processors may utilize a pipelined architecture to achieve high performance. As known in the art, a pipelined architecture includes multiple pipeline stages, each of which performs a specified operation, such as instruction fetch, instruction decode, address generation, arithmetic operations, and the like. Program instructions advance through the pipeline stages on consecutive clock cycles, and several instructions may be in various stages of completion simultaneously.
Performance can be enhanced by providing a large number of pipeline stages. The number of pipeline stages in a processor is sometimes referred to as pipeline depth. Notwithstanding the enhanced performance provided by pipelined architectures, certain program conditions may degrade performance. An example of such a program condition is a branch instruction. Branch instructions are common in most computer programs, including for example digital signal processor applications and microcontroller applications. When a branch instruction advances through a pipelined processor and branch prediction is not utilized, sequential instructions follow the branch instruction in the pipeline. If the branch is taken, the pipeline must be drained by aborting all instructions currently in the pipeline and re-executing instructions from the branch target. The branch performance penalty is proportional to the depth of the pipeline. For deeply pipelined architectures and programs having frequent branch instructions, the performance penalty is severe.
Branch prediction techniques are known in the art. In a typical prior art approach, a branch cache contains the addresses of branch instructions and corresponding prediction information. When a branch instruction is fetched, the prediction information is used to estimate if the branch will be taken.
Prior art branch prediction techniques have had drawbacks and disadvantages including, but not limited to, excessive complexity and power consumption, and limited impact on performance. Accordingly, there is a need for improved methods and apparatus for branch prediction in digital processors.