A digital signal computer, or digital signal processor (DSP), is a special purpose computer that is designed to optimize performance for digital signal processing applications, such as, for example, Fast Fourier transforms, digital filters, image processing, signal processing in wireless systems, and speech recognition. Digital signal processor applications are typically characterized by real time operation, high interrupt rates and intensive numeric computations. In addition, digital signal processor applications tend to be intensive in memory access operations and to require the input and output of large quantities of data. Digital signal processor architectures are typically optimized for performing such computations efficiently. In addition to digital signal processor applications, DSPs are frequently required to perform microcontroller operations. Microcontroller operations involve the handling of data but typically do not require extensive computation.
Digital signal processors may utilize a pipelined architecture to achieve high performance. As known in the art, a pipelined architecture includes multiple pipeline stages, each of which performs a specified operation, such as instruction fetch, instruction decode, address generation, arithmetic operations, and the like. Program instructions advance through the pipeline stages on consecutive clock cycles, and several instructions may be in various stages of completion simultaneously.
For compactness of code, some processors support instructions with varying lengths. For example, one processor supports 16-bit instructions, 32-bit instructions and 64-bit instructions. There are no restrictions on instruction alignment with respect to memory boundaries, so that the memory can be as compact as possible. During instruction execution, instructions are typically moved from memory to an instruction cache, also having no restrictions on instruction alignment. Thus, each instruction cache line may include one or more instructions, depending on instruction length, and an instruction may straddle instruction cache lines. Instruction fetches from the instruction cache are usually aligned to the cache line. Therefore, there is a need to align instructions fetched from the instruction cache before issuing instructions to the instruction decoder. Under ideal conditions, an aligned instruction should be issued to the instruction decoder every clock cycle.
Techniques for instruction alignment are known in the prior art. However, prior art instruction alignment techniques have not provided satisfactory performance for deeply pipelined, high performance processors. Accordingly, there is a need for improved methods and apparatus for aligning variable length instructions.