1. Field of the Invention
The present invention relates to a data processing apparatus and method for pre-decoding instructions, in which the pre-decoded instructions are placed in a cache for subsequent decoding and use by processing circuitry.
2. Description of the Prior Art
In a typical data processing apparatus, significant power is consumed in decoding instructions prior to execution within the execution pipelines of the processing circuitry. This issue can become particularly problematic in processing circuitry that supports multiple instruction sets, since often multiple separate decoders will need to be provided for decoding instructions from the various instruction sets. By way of example, in some implementations approximately 15% of the processor power may be consumed by the instruction decoders.
It is typically the case that one or more caches are provided within the data processing apparatus for caching the instructions and data required by the processing circuitry. At any particular level in a cache hierarchy, separate instruction and data caches may be provided (often referred to as a Harvard architecture), or alternatively a unified cache may be provided for storing the instructions and data (often referred to as a Von Neumann architecture). When instructions are fetched from memory for storing in a cache, some known systems have employed pre-decoding mechanisms for performance orientated reasons. In accordance with such mechanisms, instructions are pre-decoded prior to storing in the cache, and in such cases the cache often then stores instructions in a wider format than the instructions stored in main memory, to accommodate the additional information produced by the pre-decoding process. To assist in improving performance when the instructions are later decoded and executed, the extra information provided in the pre-decoded instructions as stored in the cache has been used to identify branch instructions, identify classes of instructions (e.g. load/store instructions, coprocessor instructions, etc) to later assist multi-issue circuitry in dispatching particular instructions to particular execution pipelines, and to identify instruction boundaries in variable length instruction sets.
For example, the article “Performance Evaluation Of A Decoded Instruction Cache For Variable Instruction Length Computers”, IEEE Transactions on Computers, Volume 43, number 10, pages 1140 to 1150, October 1994, by G Intrater et al., discusses the storing of pre-decoded instructions in a cache. The article “The S-1 Project: Developing High-Performance Digital Computers” by L. Curtis Widdoes, Jr., Lawrence Livermore National Laboratory, 11 Dec. 1979, describes the S1 Mark IIA computer, where a decoded instruction cache expanded the 36-bit instruction word to a 56-bit instruction cache format to reduce instruction decoding time (see also the paper “Livermore S-1 Supercomputer—A Short History” appearing on the website http://www.cs.clemson.edu/˜mark/s1.html). Further, the idea of using pre-decoding mechanisms to pre-identify branches and pre-identify instruction boundaries is discussed in the AMD K5 Processor Data sheet, Publication no. 18522E-0, September 1996, Section 4.5, Innovative x86 Instruction Predecoding, page 6, which discusses adding 4 bits per instruction byte to identify start, end, opcode position, and number of Rops (RISC operations) the individual x86 instruction requires for later translation.
It is advantageous to reduce the overhead associated with the decoding circuitry used to decode pre-decoded instructions to generate control signals for controlling processing circuitry. Reducing the gate count of this decoding circuitry reduces the complexity, cost and power consumption of the apparatus.
It is known from U.S. Pat. No. 5,568,646 to provide a processor supporting more than one instruction set and to use a programmable logic array within the decode stage of the instruction pipeline to map at least some of the bits of instructions of a first instruction set into corresponding bits within a second instruction set. An instruction of the first instruction set is at least partially mapped to an instruction of the second instruction set and the mapped version is then used to generate at least some control signals for controlling the other pipeline stages.