The invention described herein generally relates to micro-processors having an on-chip instruction cache and, more specifically, to a unique architecture for such an instruction memory which is particularly suited for enhancing the performance of instruction prefetch.
Current advances in very large scale intetrated (VLSI) circuit technology permits design of high performance micro-processors with cycle times well under 100 nano-seconds. Simultaneously, the performance of dynamic memories are improving to the point where random access memory (RAM) access times can very nearly match processor cycle times. However, the time it takes to drive addresses and data off chip, generate appropriate memory chip selects, do the memory access, perform error detection, and drive back to the CPU can add several (CPU) cycles to the "system" access time of a memory. As long as the CPU is fetching data sequentially, as in sequential instruction fetches, it can prefetch far enough in advance so that it sees a constant stream of data arriving at intervals equivalent to the RAM cycle time, which, as noted above, is comparable to the CPU cycle time. However, as soon as a branch instruction occurs, the "prefetch pipeline" is broken, and the CPU must wait for several cycles for the next instruction. With current VSLI chip densities, it is possible to add a fair amount of circuitry to the "CPU" chip, some of which may be devoted to decreasing this idle time. A standard approach is to put a small instruction memory, usually an instruction cache (I-cache), on the CPU chip.
An example of a single-chip micro-processor having an instruction register with associated control decode or micro-control generator circuitry is disclosed in U.S. Pat. No. 4,402,042 issued to Karl M. Guttag. In this patent, the micro-processor communicates with external memory by a bidirectional multiplexed address/data bus. Each instruction produces a sequence of microcodes which are generated by selecting an entry point for the first address of the control read only memory (ROM) then executing a series of jumps depending upon the instruction. Operating speed is increased by fetching the next instruction and starting to generate operand addresses before the current result has been calculated and stored.
U.S. Pat. No. 4,390,946 to Thomas A. Lane discloses a pipeline processor wherein micro-instructions are held in a control store that is partitioned into two microcode memory banks. This system can support three modes of sequencing: single micro-instruction, sequential multiple micro-instructions, and multiple micro-instructions with conditional branching. When a conditional branch is performed, the branch not taken path is assumed and if true, the micro-instruction following the branch is executed with no delay. If the branch is taken, the guess is purged and, following a one clock delay, the branched to micro-instruction is executed. The Lane system supports these sequencing modes at the maximum pipeline rate.
U.S. Pat. No. 4,384,342 to Takao Imura et al discloses a lookahead prefetching technique wherein a first memory address register stores the column address and module designation portions of the current effective address, a second memory address register stores the row address portion of the current effective address, and a third memory address register stores the module designation portion of the prior effective address. Since the same module is frequently accessed many times in successsion, the average access time is reduced by starting an access based on the contents of the second and third memory address registers without waiting until the column address and module designation portions of the current effective address are available from storage in the first memory address register. The access is completed, after the column address and module designation portions of the current effective address are determined, if a comparator which is connected to the first and third memory address registers confirms that the same memory module is being successively accessed. If not, the modules are accessed again based upon the contents of the first and second memory address registers.