1. Field of the Invention
This invention is related to the field of processors and, more particularly, to instruction decoding in processors.
2. Description of the Related Art
Processors often attempt to decode multiple instructions per clock cycle in order to provide high performance instruction execution (e.g. to supply superscalar execution resources). Often, the decoding of multiple instructions complicates the decoding process, and thus may lengthen the decode latency. The decode latency may be of particular concern when a control transfer occurs (e.g. a branch instruction or other instruction which redirects fetching from a sequential execution path to another address). A new fetch address is generated in response to executing a control transfer instruction, and the next instruction to be executed does not reach the execution resources until the decode latency has expired.
Decoding multiple instructions per clock cycle may increase the decode latency, for example, if the instruction set specifies variable length instructions. The instructions in a sequential code sequence are stored in consecutive bytes of memory. With variable length instructions, the initial instruction in a sequential code sequence generally must be decoded enough to determine its length in order to locate the start of the next instruction. The serial process of locating instructions may increase the decode latency. Some processors implement predecode data (e.g. a start bit and/or end bit associated with each byte that indicates if the byte is a start or end of an instruction) to simplify the process of locating multiple instructions in a decoder. However, even with the predecode data, locating the initial instruction is quicker than locating subsequent instructions. The location of the initial instruction is indicated by a start pointer provided with the instruction bytes. To locate other instructions, the decode unit must process the predecode data (or decode the initial instruction, if predecoding is not implemented). After locating one or more other instructions, the decode unit may update the start pointer to point to the next byte after the last located instruction and may update the predecode data to mask out the located instructions. The update permits, in a subsequent clock cycle, locating additional instructions within the set of instruction bytes. However, this feedback process may set on upper bound on the maximum operating frequency of the decode unit.
In some previous processors (e.g. the AMD Athlon™ line of processors available from Advanced Micro Devices, Inc., Sunnyvale, Calif.), the decode unit decodes multiple instructions per clock cycle. The decode unit operates over several pipeline stages, and multiple instructions move through the decode unit in synchronization. The multiple instructions exit the decode unit to the execution circuitry at the same time (i.e. in the same clock cycle). In such a processor, the above feedback process may either limit the operating frequency of the decode unit or may increase the decode latency.