1. Field of the Invention
The present invention relates to microprocessor architectures and, in particular, to a microprocessor that partially decodes instructions retrieved from external memory before storing them in an internal instruction cache. Partially decoded instructions are retrieved from the internal cache for either parallel or sequential execution by multiple, parallel, pipelined functional units.
2. Discussion of the Prior Art
In recent years, there has been a trend in the design of microprocessor architectures from Complex Instruction Set Computers (CISC) toward Reduced Instruction Set Computers (RISC) to achieve high performance while maintaining simplicity of design.
In a CISC architecture, each macroinstruction received by the processor must be decoded internally into a series of microinstruction subroutines. These microinstruction subroutines are then executed by the microprocessor.
In a RISC architecture, the number of macroinstructions which the processor can understand and execute is greatly reduced. Further, those macroinstructions which the processor can understand and execute are very basic so that the processor either does not decode them into any microinstructions (the macroinstruction is executed in its macro form) or the decoded microinstruction subroutine involves very few microinstructions.
The transition from CISC architectures to RISC architectures has been driven by two fundamental developments in computer design that are now being extensively applied to microprocessors. These developments are integrated cache memory and optimizing compilers.
A cache memory is a small, high speed buffer located between the processor and main memory to hold the instructions and data most recently used by the processor. Experience shows that computers very commonly exhibit strong characteristics of locality in their memory references. That is, references tend to occur frequently either to locations that have recently been referred to (temporal locality) or to locations that are near others that have recently been referred to (spatial locality). As a consequence of this locality, a cache memory that is much smaller than main memory can capture the large majority of a program's memory references. Because the cache memory is relatively small, it can be realized from a faster memory technology than would be economical for the much larger main memory.
Before the development of cache memory techniques for use in mainframe computers, there was a large imbalance between the cycle time of a processor and that of memory. This imbalance was a result of the processor being realized from relatively high speed bipolar semiconductor technology and the memory being realized from much slower magnetic-core technology. The inherent speed difference between logic and memory spurred the development of complex instruction sets that would permit the fetching of a single instruction from memory to control the operation of the processor for several clock cycles. The imbalance between processor and memory speeds was also characteristic of the early generations of 32-bit microprocessors. Those microprocessors would commonly take 4 or 5 clock cycles for each memory access.
Without the introduction of integrated cache memory, it is unlikely that RISC architectures would have become competitive with CISC architectures. Because a RISC processor executes more instructions than does a CISC processor to accomplish the same task, a RISC processor can deliver performance equivalent to that of a CISC only if a faster and more expensive memory system is employed. Integrated cache memory enables a RISC processor to fetch an instruction in the same time required to execute the instruction by an efficient processor pipeline.
The second development that has led to the effectiveness of RISC architectures is optimizing compilers. A compiler, which may be implemented in either hardware or software, translates a computer program from the high-level language used by the programmer into the machine language understood by the computer.
For many years after the introduction of high-level languages, computers were still extensively programmed in assembly language. Assembly language is a low-level source code language employing crude mnemonics that are more easily remembered by the programmer than object-code or binary equivalents. The advantages of improved software productivity and translatability of high-level language programming were clear, but simple compilers produced inefficient code. Early generations of 32-bit microprocessors were developed with consideration for assembly language programming and simple compilers.
More recently, advances in compiler technology are being applied to microprocessors. Optimizing compilers can analyze a program to allocate large numbers of registers efficiently and to manage processor pipeline resources. As a consequence, high-level language programs can execute with performance comparable to or exceeding that of assembly programs.
Many of the leading pioneers in RISC developments have been compiler specialists who have demonstrated that optimizing compilers can produce highly efficient code for simple, regular architectures.
Highly integrated single-chip microprocessors employ both pipelined and parallel execution to improve performance. Pipelined execution means that while the microprocessor is fetching one instruction, it can be simultaneously decoding a second instruction, reading source operands for a third instruction, calculating results for a fourth instruction and writing results for a fifth instruction. Parallel execution means that the microprocessor can initiate the operands for two or more independent instructions simultaneously in separate functional units.
As stated above, one of the main challenges in designing a high-performance microprocessor with multiple, pipelined functional units is to provide sufficient instruction memory on-chip and to access the instruction memory efficiently to control the functional units.
The requirement for efficient control of a microprocessor's functional units dictates a regular instruction format that is simple to decode. However, in conventional microprocessor architectures, instructions in main memory are highly encoded and of variable length to make efficient use of space in main memory and the limited bandwidth available between the microprocessor and the main memory.