Microprocessors are typically connected to an external memory (RAM) by means of a read-write channel. Typically, the read-write channel has a relatively narrow bandwidth. The program to be executed by the microprocessor is typically stored in the external RAM. Because the microprocessor continually accesses all of its instructions from the RAM, the speed of system operation is limited by the relatively narrow bandwidth of the read-write channel, regardless of the speed of operation of the microprocessor itself. Accordingly, the read-write channel is a major factor which limits the power of the microprocessor.
To overcome this limitation, instruction caches are used to increase the execution efficiency of the processor by storing frequently-used instructions in a relatively small on-chip RAM, known as an "instruction cache", when executing programs from the relatively large off-chip system RAM. The typical implementation of an instruction cache involves a special-purpose block of logic on the processor that controls the caching of instructions. This logic is responsible for determining whether the next instruction that needs to be executed is already in the cache.
If the instruction is already in the cache, it is executed relatively quickly from the cache; otherwise, it is read relatively slowly from the off-chip RAM into the cache, and is then executed. In the latter case, some specific algorithm must be used to determine which instruction in the cache, which may already be full, should be removed to make room for the new instruction read from the off-chip RAM.
Although this approach, hereinafter referred to as a "hardware cache", leads to an increase in performance as compared to a processor without a cache, a problem still remains. This problem is that the caching algorithm is fixed in hardware. In addition, due to chip complexity constraints, the hardware fixed algorithm is usually rather simple. No matter what the algorithm is, it is always possible to find programs, especially large programs, for which the algorithm does not perform very well. This is due to the fact that such algorithms are usually very "local" in nature and therefore do not take into account the global structure of a program or are simply not well matched to the execution profile of some programs.
In addition, a hardware cache requires a possibly significant amount of chip area for the caching logic circuitry, registers or RAM blocks etc. By way of example, one hardware caching scheme, known in the art as a two-way associative cache, requires a fairly large on-chip RAM, on the order of 500-1000 bytes, in addition to the no-chip instruction RAM, in order to implement the caching algorithm.