1. Field of the Invention
The invention relates generally to electronic circuitry and more particularly to digital circuit design techniques in semiconductor devices and computer systems.
2. Description of the Prior Art
Modern high-performance microprocessors employ fast local memory as a cache reservoir of the most often accessed instructions and data. The resulting reduction in effective main memory access times produces higher system performance. In many types of systems, one example being workstations, the parts count and system cost are secondary features to performance. But in other applications, such as the embedded processors in laser printers, system cost is a prime consideration.
One such modern high-performance microprocessor is the Integrated Device Technology, Inc., IDT79R3000. The IDT79R3000 has an on-chip cache control that supports a "direct mapped" cache. In direct mapping, each word of the physical memory space is mapped into one cache memory location. The "n" least significant bits (LSB) of the memory address are used to index the direct mapped cache to specify a unique word. In this context, "n" is the number of address lines used to access the cache memory. Since the R3000 can address 32 Megabytes of memory, each cache location can contain the contents of one of the 2.sup.32 -n physical locations in main memory. To identify the origin of the exact page of physical memory of a word which has been replicated in the cache, it is "tagged" by the 32-n most significant bits (MSB) of the main memory address. Consequently, each cache memory location stores both the data and the upper address bits of the physical main memory that the data is related to.
The tag is subsequently compared to the main memory addresses being output by the microprocessor, and if the tag matches, a cache "hit" is said to have occurred. If a cache miss occurs, the data will not be found in the cache and an access of the main memory must be attempted instead. A cache hit means that data will be accessed faster because the duplicate data found in the cache memory will respond much faster than would the original data in main memory.
The cache subsystem of a typical IDT79R3000 implementation is logically comprised of three parts: 1) memories containing data and/or instructions being cached from main memory, 2) memories which contain tags that indicate the page number of a cached word, and, 3) a comparator that compares each Physical Frame Number (PFN) generated by the on-chip Memory Management Unit (MMU) with the tag data read from the tag memory.
Specialized memories and extensive logic to control the cache are not required with the IDT79R3000 cache subsystem, because standard Static Random Access Memory (SRAM) components are adequate. The IDT79R3000 has separate control pins for the instruction cache controls: IRd, IWr, and IClk; and for the data cache controls: DRd, DWr, and Dclk. This enables the separation of instruction and data caches without the need for external decoding logic.
The IDT79R3000 has overlapping, but separate, pins for the 6 MSB bits of its "AdrLo" port and the 6 LSB bits of physical frame number (PFN) that enable each cache size to be independently varied from 4 Kb to 256 Kb. It also has a "Harvard" architecture which achieves high performance by overlapping load/store operations with the instruction fetch and execution.
A typical IDT79R3000's cache interface addressing is comprised of 10 bits of the virtual page address. Bits 0-1 of the offset are not connected to the cache memory since the IDT79R3000 accesses only 32 bit words from the cache. Since the ID79R3000 has instructions to access bytes or half words from memory, a whole word is read from the cache and, during read operations, the desired portion is automatically extracted, and then during write operations, the modified portion is merged and the whole word is written back.
The number of bits needed to tag memory pages is a function of the maximum cacheable memory size and the cache size. If a cache size of 4K bytes is used with the R3000 in a specific design and a cacheable memory space of 4 gigabytes is assumed then, the number of tag bits required is 20 bits (log base-2 of the size of the memory, divided by the cache size). In systems which do not utilize all of the 4 Gb address space, or in systems with more than 4 Kb of cache, less than 20 bits of cache tagging would be required. In such a case, the upper bits of the cache tag will always have the same value stored in them. But because the ID79R3000 processor compares the PFN to all 20 tag bits, it is necessary to supply the processor with all unused bits as well.
The ID79R3000 can accommodate many different cache sizes and system memory sizes, but at the expense of requiring wide cache words and therefore fast, costly SRAM. For example, in systems which implement larger caches, only 32-n (where n is log base-2 of the cache size) are actually required to identify the origin of the word from memory; however, the ID79R3000 uses the whole 20-bit page number as a tag for all cache sizes. Thus, for larger caches, the ID79R3000 compares more tag bits than would otherwise be needed to support a larger cache size.
In an application where the cache size is 128 Kb, the address for the cache will be 17 bits wide (n=17). All memory words of the selected page of the physical main memory with the same 17 LSB address bits (0-16) are respectively mapped into the same order within the cache. In this configuration, bits 12-16 are redundant if carried as tag bits as required by the R3000 tag comparator bus; these bits are actually used to address a unique cache memory location directly, and so do not need to be stored as a tag word. Notwithstanding the fact that only 15 bits are needed with a 128 Kb cache to "tag" the cache words (because 32-n=15), an ID79R3000 must use all 20 bits of the page number to determine whether a cache hit has occurred.
Table I shows the redundant tag bits that will occur for several different possible cache sizes a systems designer might select:
TABLE I ______________________________________ Cache Size AdrLo Redundant address/tag bits ______________________________________ 4Kb 11:2 none 8Kb 12:2 12 16Kb 13:2 12-13 32Kb 14:2 12-14 64Kb 15:2 12-15 128Kb 16:2 12-16 256Kb 17:2 12-17 ______________________________________
Further complicating the situation is the IDT79R3000's requirement that not only must the redundant tag bits, 12-16, be returned to the IDT79R3000, they must be properly toggled to correspond to the portion of the 128 Kb cache addressed by those bits. A simple tying high or low of these lines would not be possible, even if the IDT79R3000's tag pins were simple inputs and not input/outputs.
Since redundant bits are always generated as part of a cache address, it is possible to feed the bits back to the tag comparator inputs through a buffer--rather than more expensively storing their values in the cache in tag memories. It would be desirable to make a further parts cost savings by disabling the participation of these bits in the tag comparison process and thus avoid the use of latches altogether. In microprocessors, such as the IDT79R3000, this is not possible because the system designer does not have access to the necessary points within the semiconductor circuit itself.
Similarly, most systems implement smaller main memory than the 4 gigabytes assumed by the R3000. In such systems, fewer tag bits should be required to uniquely map cache entries to their main memory origin. However, the R3000 still requires 20 tag bits to be supplied for comparison. One way to reduce a system's component parts count is to store the upper tag bits in an inexpensive register rather than in additional memories. Such a design choice could mean significant savings. As an example, if a design has 4 Mb of cacheable memory and a 64 Kb cache, that would mean eleven IDT6198 (16K.times.4) memories--eight for data, one for parity and two for the tag--would be needed. The tag memory would contain 6 tag bits (Ad16-Ad21, corresponding to the 1024 possible pages of the 4 Mb cacheable memory); a valid bit; and 2 tag parity bits. The high order tag bits (Ad22-Ad31) and the third tag parity bit (TagP2 ) would be supplied by a 10-bit register (e.g. IDT 74FCT821), which are loaded by the CPU and they indicate where--within the 4 Gb address space--the 4 Mb of cacheable memory resides. The low-order tag bits 12-16 are provided to the processor by a buffer (e.g. IDT 74FCT244A), which feeds the redundant AdrLo (bits into the processor tag comparator.
The superfluous, most significant tag address bits (with regards to main memory and cache size) can be calculated as follows: EQU SUPERFLUOUS.sub.-- TAG=32-log.sub.2 (mem size)
The total number of unique tag address bits that are required with regards to main memory and cache size can be calculated as follows: EQU UNIQUE.sub.-- TAG=Log.sub.2 (mem size)-Log.sub.2 (cache size)
In order to reduce the amount of local/cache memory, the systems designer must be able to select which address bits will participate in the comparison with data from the cache. Current cache controllers of microprocessors such as IDT79R3000 do not accommodate such a selection. The problem then to be solved is how to allow the systems designer to be able to program the tag bits without simultaneously adding significant cost to the designer's system or the manufacturer's production of the semiconductor chips, while not degrading system performance, and without requiring software to participate in programming. The solution will therefore achieve software independence across multiple systems with different memory configurations.