The present invention relates to computer architecture and, more particularly, to computer architectures including cache memory. A major objective of the present invention is to provide for improved handling of branch instructions by cache memory.
Much of modern progress is associated with advances in computer performance. Computer performance has been enhanced by the development of ever faster microprocessors. However, increases in microprocessor speed provide diminishing returns unless the speed of their interaction with associated components is also increased.
One bottleneck has been the communication rate between a microprocessor and main memory. The instructions to be executed by the microprocessor and the data on which operations implemented by the instructions are to be performed are stored at addresses within main memory. To access instructions and data, the microprocessor transmits addresses to main memory. The main memory decodes the address and makes the contents at the requested address available for reading and/or writing. The time required for the microprocessor to transmit an address to main memory and receive the respective contents therefrom can significantly constrain system performance.
Cache memories relax this performance constraint. A cache memory is a small, fast memory that keeps copies of recently used memory items. When these items are used again, for example, in a program loop, they can be accessed from the cache memory instead of main memory. Instead of slower main memory access speeds, the processor can operate at faster cache access speeds most of the time. Cache memories are used extensively in high-performance computers and are migrating to smaller computer systems. In particular, cache memories are being implemented in the increasingly popular reduced-instruction set computers (RISC) because they rely on fast execution of relatively high numbers of simple instructions.
Because it is small, a cache memory can be quickly filled. Once filled, storing more recently used memory items requires erasing less recently used memory items. The algorithm that determines when a newly requested memory item is to be stored in cache memory and what cache memory item it is to replace is critical to cache performance. A "least recently used" criterion can be used. However, in its simplest form, this criterion could lead to removal of all instructions stored in cache when a large data transfer occurs. For this reason, some systems, e.g., those employing a "Harvard architecture" use separate instruction and data caches so that data is only replaced by data and instructions are only replaced by instructions.
More sophisticated cache systems provide for finer differentiation of types of memory items. For example, a separate cache can be provided for branch target instructions (as in, for example, the AMD 29000 manufactured by Advanced Micro Devices, San Jose, Calif.). By default, instructions are executed in the order of their addresses in main memory. However, "branch" instructions either conditionally or unconditionally call for jumps to instructions out of sequence. Unless measures are taken to "protect" branch targets in a cache, they are likely to be replaced by general instructions before they are called for again. Providing a dedicated branch target cache solves this problem.
While caches are generally designed to operate invisibly to the program, some cache systems provide caches that are under program control. Thus a programmer can select certain frequently used instructions to remain relatively permanently in cache memory.
Because total cache memory is limited, dedicating sections of cache memory to specific memory item types or uses can impair performance where the sizes of the sections are not well matched to the frequency of use of the respective item type. For example, a large branch instruction section can limit the size of the section for general instructions. This can result in non-optimal performance when the percentage of non-branch instructions is relatively high. In addition, cache segmentation can result in some redundancy and complexity where an instruction is called for (at different times) in sequence and as a branch target. In any event, cache design is a subject of intense effort, and further refinements are eagerly sought.