With advances in very large scale integration (VLSI) and supercomputing, a processor with computational capability similar to a supercomputer can be fabricated on a single chip. Although improvements in integrated circuit technology have resulted in significantly reduced gate delays, the speed and density of memory components have not been improved proportionately. Consequently, the overall performance of computers using these processors is usually limited by the memory system speed. Cache memories are used to bridge the gap between memory and processor speeds.
Cache memory is a high speed buffer memory which interfaces between a computer processing unit and main memory. As used herein, the term processing unit may refer to a central processing unit or to a multiprocessor processing element. The cache memory is intended to maintain quickly-accessible copies of data and instructions which are most likely to be needed by the processor. The cache memory can be much smaller than the main memory and it is therefore feasible to implement cache memory using faster, more expensive technology than can be economically used to implement a main memory. If the cache holds the appropriate data and instructions, the processing unit effectively sees only the fast cache access time, yet has the large main memory address space. Moreover, a properly managed cache system can have an indirect beneficial effect on computational efficiency by reducing system bus traffic.
A processing unit operates by sequentially executing a program of instructions which are stored in addressed locations of the main memory. Program blocks containing instructions for sequential execution are stored in contiguous memory addresses. The processing unit sequentially requests these instructions from memory via a program counter register which is incremented to point at a new instruction code during each instruction cycle. As long as the program flow remains sequential, cache memory operation is easily implemented by prefetching instruction codes from memory locations one or more lines ahead of the address held in the program counter, The prefetched instructions are then available in high speed cache memory when they are actually addressed by the processor and a "cache hit" is said to be achieved. However if the program flow requires a branch or loop to a non-sequential instruction address, the requested instruction code may not be present in the cache memory when it is requested by the processor and a "cache miss" is said to occur. When a cache miss occurs, processing must be suspended while the data is fetched from main memory.
The design goals for a good cache memory system are therefore: the cache hit ratio should be high so that the processing unit does not need to wait for instructions; the bus traffic should be as low as possible so that the chance for bus contention between data and instruction accesses is reduced (bus traffic is a particularly important performance bottleneck for single chip systems because the total physical IO pins of a chip are often limited); and efficient use should be made of chip area since it is always expensive and limited. The last criteria implies that the hardware complexity associated with the cache control should be as simple as possible.