1. Field of the Invention
The invention relates to IBM PC AT-compatible computer architectures, and more particularly, to enhancements thereof for cache memory management.
2. Description of Related Art
The IBM PC AT computer architecture is an industry standard architecture for personal computers and is typically built around a CPU such as an 80286, 80386SX, 80386DX, or 80486 microprocessor manufactured by Intel Corporation. The CPU is coupled to a local bus, capable of performing memory accesses and data transfers at high rates of speed (i.e., on the order of 10-50 MHz with today's technology). The local bus includes 16 or 32 data lines, a plurality of memory address lines, and various control lines.
The typical IBM PC AT-compatible platform also includes DRAM main memory, and in many cases a timer, a real-time clock, and a cache memory, all coupled to the local bus.
The typical IBM PC AT-compatible computer also includes an I/O bus which is separate and distinct from the local bus. The I/O bus, an AT bus, an ISA bus or an EISA bus, is coupled to the local bus via certain interface circuitry. The I/O bus includes 16 or 32 data lines, a plurality of I/O address lines, as well as control lines. The I/O address space is logically distinct from the memory address space and if the CPU desires to access an I/O address, it does so by executing a special I/O instruction. The interface circuitry recognizes the I/O signals thereby generated by the CPU, performs the desired operation over the I/O bus, and if appropriate, returns results to the CPU over the local bus.
In practice, some I/O addresses may reside physically on the local bus and some memory addresses may reside physically on the I/O bus. The interface circuitry is responsible for recognizing that a memory or I/O address access must be emulated by an access to the other bus, and is responsible for doing such emulation. For example, a ROM (or EPROM) BIOS may be physically on the I/O bus, but actually form part of the local memory address space. During system boot, when the CPU sends out a non-I/O address which is physically within the ROM BIOS, the interface circuitry recognizes such, enables a buffer which couples the address onto the I/O bus, and activates the chip select for the ROM. The interface circuitry then assembles a data word of the size expected by the CPU, from the data returned by the ROM, and couples the word onto the local bus for receipt by the CPU. In many systems, at some point during the ROM-based boot-up procedure, the ROM BIOS is copied into equivalent locations in the DRAM main memory and thereafter accessed directly. The portion of DRAM main memory which receives such portions of the BIOS is sometimes referred to as "shadow RAM."
More specifically, in the standard PC AT architecture, the logical main memory address space is divided into a low memory range (0h-9FFFFh), a reserved memory range (A0000h-FFFFFh) and an extended memory range (100000h-FFFFFFh). In a typical system the system ROM BIOS is located logically at addresses F0000h-FFFFFh, and is located physically on the I/O bus. Additional system ROM BIOS may be located in expansion sockets at addresses E0000h-EFFFFh, physically located on the I/O bus. Addresses C0000h-EFFFFh contain ROM BIOS portions for specific add-on cards and are located physically on their respective cards on the I/O bus. Addresses A0000h-BFFFFh contain the video buffer, located physically on a video controller on the I/O bus. The video buffer may be accessible directly over the local bus. Duplicate memory space is typically provided in DRAM on the local bus for addresses C0000h-FFFFFh, and the user of the system can select which portions of the ROM BIOS are to be "shadowed" by being copied into the duplicate DRAM space during boot-up. Subsequent accesses to "shadowed" portions of the BIOS are to the DRAM copy, which is typically much faster than accesses to the ROM copy. As used herein, the term "secondary memory" refers to any storage elements present in the system, which are accessible in the main memory address space.
When an Intel 80.times.86 microprocessor first powers up, it begins by executing the instruction located 16 bytes from the highest memory address. For the 8086/8088, this address is FFFF0h. For the 80286, it is FFFFF0h, for the 80386 it is FFFFFFF0h, and for the 80486 it is FFFFFFF0h. Typical IBM PC AT-compatible systems have a jump instruction at this address, to the beginning of a power-on self-test (POST) routine in the system ROM BIOS. The POST tests the microprocessor, memory, and other hardware components for presence and reliability, and also initializes various interrupt vector table entries with default values pointing to handler routine within the system BIOS.
As part of its duties, the POST scans for add-on ROM BIOS modules beginning at every 2 k byte increment from address C0000h to DFFFFh. At each increment, it checks for a signature of 55 h at offset 0, and AAh at offset 1 to indicate a valid add-on ROM BIOS module. The byte at offset 2 then contains the length of the BIOS module (measured in 512 byte blocks), and offset 3 begins the executable code for the module. The POST performs a checksum on all the bytes in the module, which should always yield a value of 00h in each of the low-order two bytes, and then executes a "far call" instruction to the offset 3 byte to permit the module to perform its own initialization. The module executes a "far return" instruction to return to the POST. The portion of the POST which checks for ROM BIOS modules is known as BIOS sizing.
The BIOS sizing operation in the POST also checks for an expansion system BIOS ROM in the range E0000h-EFFFFh. The POST checks for a valid signature at offsets 0 and 1 at each increment, performs a checksum verification for the modules it finds, and executes a far call to the offset 3 byte of the module to permit the module to perform its own initialization. An overall checksum verification is also perform on the main system BIOS range F0000h-FFFFFh.
The POST also checks the integrity of all the memory in the system from address 0h to 9FFFFh and 100000h to the top of memory by writing known data to these addresses and then reading it back. The POST also checks the integrity of whatever video memory is present in addresses A0000h-BFFFFh by the same or a similar method.
In addition to the above elements of a standard PC AT-compatible system, a keyboard controller typically is also coupled to the I/O bus, as is a video display controller. A typical IBM PC AT-compatible system may also include a DMA controller which permits peripheral devices on the I/O bus to read or write directly to or from main memory, as well as an interrupt controller for transmitting interrupts from various add-on cards to the CPU. The add-on cards are cards which may be plugged into slot connectors coupled to the I/O bus to increase the capabilities of the system.
General information on the various forms of IBM PC AT-compatible computers can be found in IBM, "PC/AT Technical Reference Manual" (1985); Sanchez, "IBM Microcomputers: A Programmer's Handbood" (McGraw-Hill: 1990) and Solari, "AT Bus Design" (San Diego: Annabooks, 1990). See also the various data books and data sheets published by Intel Corporation concerning the structure and use of the iAPX-86 family of microprocessors, including the "386 DX Microprocessor", data sheet, published by Intel Corporation (1990), and "i486.TM. Processor Hardware Reference Manual", published by Intel Corporation (1990). All the above references are incorporated herein by reference.
Recently, efforts have been made to reduce the size and improve the manufacturability of PC AT-compatible computers. Specifically, efforts have been made to minimize the number of integrated circuit chips required to build such a computer. Several manufacturers have developed "PC AT chipsets", which integrate a large amount of the I/O interface circuitry and other circuitry onto only a few chips. An example of such a chipset for an ISA architecture is the 386WT PC/AT chipset manufactured by OPTi, Inc., Santa Clara, Calif., made up of the OPTi 82C381, 82C382 and 82C206. Examples of such a chipset for an EISA architecture are described in Intel, "82350 EISA Chip set" (1990) and in Intel, "82350DT EISA Chip Set" (1992), both available from Intel Corp., Santa Clara, Calif.; and in Buchanan, "A Highly Integrated VLSI Chip Set For EISA System Design," [Need Journal and date], pp. 293-[?][this is the article re TI chipset].
Several of these chipsets, including the 386 WT chipset, implement a direct mapped cache memory to improve performance. The use of a small, high speed cache in a computer design permits the use of relatively slow but inexpensive DRAM for the large main memory space, by taking advantage of the "property of temporal locality," i.e., the property inherent in most computer programs wherein a memory location referenced at one point in time is very likely to be referenced again soon thereafter. Descriptions of the various uses of and methods of employing caches appear in the following articles: Kaplan, "Cache-based Computer Systems," Computer, March, 1973 at 30-36; Rhodes, "Caches Keep Main Memories From Slowing Down Fast CPUs," Electronic Design, Jan. 21, 1982, at 179; Strecker, "Cache Memories for PDP-11 Family Computers," in Bell, "Computer Engineering" (Digital Press), at 263-67, all incorporated herein by reference. See also the description at pp. 6-1 through 6-11 of the "i486 Processor Hardware Reference Manual" mentioned above.
In general, a direct mapped cache memory comprises a high speed data RAM and a parallel high speed tag RAM. The RAM address of each line in the data cache is the same as the low-order portion of the main memory line address to which the entry corresponds, the high-order portion of the main memory address being stored in the tag RAM. Thus, if main memory is thought of as 2.sup.m blocks of 2.sup.n "lines" of one or more bytes each, the i'th line in the cache data RAM will be a copy of the i'th line of one of the 2.sup.m blocks in main memory. The identity of the main memory block that the line came from is stored in the i'th location in the tag RAM. Tag RAM typically also contains a "valid" bit corresponding to each entry, indicating whether the tag and data in that entry are valid.
When a CPU requests data from memory, the low-order portion of the line address is supplied as an address to both the cache data and cache tag RAMs. The tag for the selected cache entry is compared with the high-order portion of the CPU's address and, if it matches, then a "cache hit" is indicated and the data from the cache data RAM is enabled onto the data bus. If the tag does not match the high-order portion of the CPU's address, or the tag data is invalid, then a "cache miss" is indicated and the data is fetched from main memory. It is also placed in the cache for potential future use, overwriting the previous entry. Typically, an entire line is read from main memory and placed in the cache on a cache miss, even if only a byte is requested. On a data write from the CPU, either the cache RAM or main memory or both may be updated, it being understood that flags may be necessary to indicate to one that a write has occurred in the other.
Accordingly, in a direct mapped cache, each "line" of secondary memory can be mapped to one and only one line in the cache. In a "fully associative" cache, a particular line of secondary memory may be mapped to any of the lines in the cache; in this case, in a cacheable access, all of the tags must be compared to the address in order to determine whether a cache hit or miss has occurred. "k-way set associative" cache architectures also exist which represent a compromise between direct mapped caches and fully associative caches. In a k-way set associative cache architecture, each line of secondary memory may be mapped to any of k lines in the cache. In this case, k tags must be compared to the address during a cacheable secondary memory access in order to determine whether a cache hit or miss has occurred. Aspects of the present invention apply to each of the above cache architectural variations. Caches may also be "sector buffered" or "sub-block" type caches, in which several cache data lines, each with its own valid bit, correspond to a single cache tag RAM entry. Aspects of the invention may apply to sector buffered caches as well, especially to the extent that elimination of the valid bits effectively converts such caches to non-sector buffered caches with a line size equal to the former sector size.
In PC AT-compatible computers, the chipset performs all the management functions for the cache, while the cache data memory itself is located in SRAM off-chip. The tag memory is also located off-chip in a tag RAM. The user can specify, through a user setup program which programs registers in the chipset, which memory address ranges are to be cacheable and which are not. Due to the special nature of addresses A0000h-BFFFFh and C8000h-FFFFFh, these addresses are never cacheable in a typical chipset.
On system power-up, the external cache data and tag RAM both contain random data, including in the valid bit. Unless special precautions are taken, therefore, one or more lines of random data in the cache erroneously may appear to the chipset to contain valid information. One solution to this problem might be to use a dedicated tag RAM chip which has a "flush" pin. The CY7B181 chip manufactured by Cypress Semiconductor Corp. is one such chip. The flush pin would be connected to the system reset line to force the tag RAM to invalidate all its entries before the first instruction fetch by the CPU. Dedicated tag RAM chips are expensive, however, and preferably avoided in PC AT-compatible computers.
In some chipsets, the problem is solved using standard SRAM chips to store tag RAM. These chipsets power up with cacheing disabled, and special routines in the setup program, or in a driver, invalidate each cache tag entry before enabling cacheing. Since the tag RAM is not directly accessible by the CPU in PC AT architectures, however, this technique usually requires the provision of special registers in the chipset through which the accesses can be made. It also requires specialized setup program code to accomplish the flush, which is undesirable since industry standard BIOS ROMs generally cannot be used. The technique also imposes a small time delay in the boot procedure which it would be desirable to avoid.
In the 386 WT chipset, a dedicated tag RAM was used which included an "invalidate" input pin to clear the valid bit for the entry currently being addressed. The chipset itself included an "invalidate" output for connection to that pin, and the chipset solved the power-up cache-flush problem by powering up in a default state with cacheing disabled and including logic to activate the invalidate output whenever cacheing was disabled. Thus, when the POST performed its standard memory test operation, which included (among other things) reads from all the bytes in a memory address range much larger than the maximum allowed cache size of 256 k bytes, all the tag RAM entries were invalidated automatically. This solution avoided the need for any special setup program routines, but still required the use of expensive dedicated tag RAM.
Another problem which occurs in PC AT-compatible computers arises because there is no way to directly read or write information in the cache tag RAM. In the typical PC AT-compatible computer, the data pins of the tag RAM are permanently coupled to receive input from the high-order address leads of the local bus as explained above, and are permanently coupled to provide output to a tag match comparator. Cache tag entries have no corresponding address in the main memory or I/O address space. For diagnostic purposes, however, it would be desirable to be able to write any desired data to a selected tag RAM entry, and also to read the data currently in a tag RAM entry. This capability would be desirable so that, for example, the POST could test and size the cache much like it does so for DRAM main memory.