1. Field of the Invention
The present invention relates to a data processing apparatus and method for pre-decoding instructions, whereafter the pre-decoded instructions are placed in a cache for access by processing circuitry within the data processing apparatus.
2. Description of the Prior Art
In a typical data processing apparatus, significant power is consumed in decoding instructions prior to execution within the execution pipelines of the processing circuitry. This issue can become particularly problematic in processing circuitry that supports multiple instruction sets, since often multiple separate decoders will need to be provided for decoding instructions from the various instruction sets. By way of example, in some implementations approximately 15% of the processor power may be consumed by the instruction decoders.
It is typically the case that one or more caches are provided within the data processing apparatus for caching the instructions and data required by the processing circuitry. At any particular level in a cache hierarchy, separate instruction and data caches may be provided (often referred to as a Harvard architecture), or alternatively a unified cache may be provided for storing the instructions and data (often referred to as a Von Neumann architecture). When instructions are fetched from memory for storing in a cache, some known systems have employed pre-decoding mechanisms for performance orientated reasons. In accordance with such mechanisms, instructions are pre-decoded prior to storing in the cache, and in such cases the cache often then stores instructions in a wider format than the instructions stored in main memory, to accommodate the additional information produced by the pre-decoding process. To assist in improving performance when the instructions are later decoded and executed, the extra information provided in the pre-decoded instructions as stored in the cache has been used to identify branch instructions, identify classes of instructions (e.g. load/store instructions, coprocessor instructions, etc) to later assist multi-issue circuitry in dispatching particular instructions to particular execution pipelines, and to identify instruction boundaries in variable length instruction sets.
For example, the article “Performance Evaluation Of A Decoded Instruction Cache For Variable Instruction Length Computers”, IEEE Transactions on Computers, Volume 43, number 10, pages 1140 to 1150, October 1994, by G Intrater et al., discusses the storing of pre-decoded instructions in a cache. The article “The S-1 Project: Developing High-Performance Digital Computers” by L. Curtis Widdoes, Jr., Lawrence Livermore National Laboratory, 11 Dec. 1979, describes the S1 Mark IIA computer, where a decoded instruction cache expanded the 36-bit instruction word to a 56-bit instruction cache format to reduce instruction decoding time (see also the paper “Livermore S-1 Supercomputer—A Short History” appearing on the website http://www.cs.clemson.edu/˜mark/s1.html). Further, the idea of using pre-decoding mechanisms to pre-identify branches and pre-identify instruction boundaries is discussed in the AMD K5 Processor Data sheet, Publication no. 18522E-0, September 1996, Section 4.5, Innovative x86 Instruction Predecoding, page 6, which discusses adding 4 bits per instruction byte to identify start, end, opcode position, and number of Rops (RISC operations) the individual x86 instruction requires for later translation.
Whilst the above-mentioned pre-decoding mechanisms can improve the performance of the processing circuitry, they do not typically significantly alleviate the earlier mentioned power cost associated with the later decoder circuits used to decode the instructions once they are output from the instruction cache. Thus, it is desirable to provide an improved pre-decoding mechanism, which can reduce the power and area cost associated with the later decoder circuits used to decode the instructions.