1. Field of the Invention
The invention relates to IBM PC AT-compatible computer architectures, and more particularly, to enhancements thereof for power-on cache flushing.
2. Description of Related Art
The IBM PC AT (Trademark of IBM Corp.) computer architecture is an industry standard architecture for personal computers and is typically built around a Central Processing Unit (CPU) such as an 80286, 80386SX, 80386DX, or 80486 microprocessor manufactured by Intel Corporation. The CPU is coupled to a local bus, capable of performing memory accesses and data transfers at high rates of speed (i.e., on the order of 10-50 MHz with today's technology). The local bus includes 16 or 32 data lines, a plurality of memory address lines, and various control lines.
The typical IBM PC AT-compatible platform also includes Dynamic Random Access Memory (DRAM) main memory, and in many cases a timer, a real-time clock, and a cache memory, all coupled to the local bus.
The typical IBM PC AT-compatible computer also includes an Input/Output (I/O) bus which is separate and distinct from the local bus. The I/O bus, sometimes referred to in these systems as an AT bus, an Industry Standard Architecture (ISA) bus or an Extended Industry Standard Architecture (EISA) bus, is coupled to the local bus via certain interface circuitry. The I/O bus includes 16 or 32 data lines, a plurality of I/O address lines, as well as control lines. The I/O address space is logically distinct from the memory address space and if the CPU desires to access an I/O address, it does so by executing a special I/O instruction. The interface circuitry recognizes the I/O signals thereby generated by the CPU, performs the desired operation over the I/O bus, and if appropriate, returns results to the CPU over the local bus.
In practice, some I/O addresses may reside physically on the local bus and some memory addresses may reside physically on the I/O bus. The interface circuitry is responsible for recognizing that a memory or I/O address access must be emulated by an access to the other bus, and is responsible for doing such emulation. For example, a Read Only Memory (ROM) (or Erasable Programmable ROM (EPROM)) Basic Input Output System (BIOS) may be physically on the I/O bus, but actually form part of the local memory address space. During system boot, when the CPU sends out a non-I/O address which is physically within the ROM BIOS, the interface circuitry recognizes such, enables a buffer which couples the address onto the I/O bus, and activates the chip select for the ROM. The interface circuitry then assembles a data word of the size expected by the CPU, from the data returned by the ROM, and couples the word onto the local bus for receipt by the CPU. In many systems, at some point during the ROM-based boot-up procedure, the ROM BIOS is copied into equivalent locations in the DRAM main memory and thereafter accessed directly. The portion of DRAM main memory which receives such portions of the BIOS is sometimes referred to as "shadow Random Access Memory (RAM)".
More specifically, in the standard PC AT architecture, the logical main memory address space is divided into a low memory range (0h-9FFFFh), a reserved memory range (A0000h-FFFFFh) and an extended memory range (100000h-FFFFFFh). In a typical system the system ROM BIOS is located logically at addresses F0000h-FFFFFh, and is located physically on the I/O bus. Additional system ROM BIOS may be located in expansion sockets at addresses E0000h-EFFFFh, physically located on the I/O bus. Addresses C0000h-EFFFFh contain ROM BIOS portions for specific add-on cards and are located physically on their respective cards on the I/O bus. Addresses A0000h-BFFFFh contain the video buffer, located physically on a video controller on the I/O bus. Duplicate memory space is typically provided in DRAM on the local bus for addresses C0000h-FFFFFh, and the user of the system can select which portions of the ROM BIOS are to be "shadowed" by being copied into the duplicate DRAM space during boot-up. Subsequent accesses to "shadowed" portions of the BIOS are to the DRAM copy, which is typically much faster than accesses to the ROM copy.
When an Intel 80.times.86 microprocessor first powers up, it begins by executing the instruction located 16 bytes from the highest memory address. For the 8086/8088, this address is FFFF0h. For the 80286, it is FFFFF0h, for the 80386 it is FFFFFFF0h, and for the 80486 it is FFFFFFF0h. Typical IBM PC AT-compatible systems have a jump instruction at this address, to the beginning of a power-on self-test (POST) routine in the system ROM BIOS. The POST tests the microprocessor, memory, and other hardware components for presence and reliability, and also initializes various interrupt vector table entries with default values pointing to handler routine within the system BIOS.
As part of its duties, the POST scans for add-on ROM BIOS modules beginning at every 2k byte increment from address C0000h to DFFFFh. At each increment, it checks for a signature of 55h at offset 0, and AAh at offset 1 to indicate a valid add-on ROM BIOS module. The byte at offset 2 then contains the length of the BIOS module (measured in 512 byte blocks), and offset 3 begins the executable code for the module. The POST performs a checksum on all the bytes in the module, which should always yield a value of 00h in each of the low order two bytes, and then executes a "far call" instruction to the offset 3 byte to permit the module to perform its own initialization. The module executes a "far return" instruction to return to the POST. The portion of the POST which checks for ROM BIOS modules is known as BIOS sizing.
The BIOS sizing operation in the POST also checks for an expansion system BIOS ROM in the range E0000h-EFFFFh. The POST checks for a valid signature at offsets 0 and 1 at each increment, performs a checksum verification for the modules it finds, and executes a far call to the offset 3 byte of the module to permit the module to perform its own initialization. An overall checksum verification is also perform on the main system BIOS range F0000h-FFFFFh.
The POST also checks the integrity of all the memory in the system from address 0h to 9FFFFh and 100000h to the top of memory by writing known data to these addresses and then reading it back. The POST also checks the integrity of whatever video memory is present in addresses A0000h-BFFFFh by the same or a similar method.
In addition to the above elements of a standard PC AT-compatible system, a keyboard controller typically is also coupled to the I/O bus, as is a video display controller. A typical IBM PC AT-compatible system may also include a Direct Memory Access (DMA) controller which permits peripheral devices on the I/O bus to read or write directly to or from main memory, as well as an interrupt controller for transmitting interrupts from various add-on cards to the CPU. The add-on cards are cards which may be plugged into slot connectors coupled to the I/O bus to increase the capabilities of the system.
General information on the various forms of IBM PC AT-compatible computers can be found in IBM, "PC/AT Technical Reference Manual" (1985); Sanchez, "IBM Microcomputers: A Programmer's Handbook" (McGraw-Hill: 1990) and Solari, "AT Bus Design" (San Diego: Annabooks, 1990). See also the various data books and data sheets published by Intel Corporation concerning the structure and use of the iAPX-86 family of microprocessors, including the "386 DX Microprocessor", data sheet, published by Intel Corporation (1990). All the above references are incorporated herein by reference.
Recently, efforts have been made to reduce the size and improve the manufacturability of PC AT-compatible computers. Specifically, efforts have been made to minimize the number of integrated circuit chips required to build such a computer. Several manufacturers have developed "PC AT chipsets", which integrate a large amount of the I/O interface circuitry and other circuitry onto only a few chips. An example of such a chipset is the 386WT PC/AT chipset manufactured by OPTi, Inc., Santa Clara, Calif., made up of the OPTi 82C381, 82C382 and 82C206.
Several of these chipsets, including the 386 WT chipset, implement a direct mapped cache memory to improve performance. The use of a small, high speed cache in a computer design permits the use of relatively slow but inexpensive DRAM for the large main memory space, by taking advantage of the "property of temporal locality," i.e., the property inherent in most computer programs wherein a memory location referenced at one point in time is very likely to be referenced again soon thereafter. Descriptions of the various uses of and methods of employing caches appear in the following articles: Kaplan, "Cache-based Computer Systems," Computer, 3/73 at 30-36; Rhodes, "Caches Keep Main Memories From Slowing Down Fast CPUs," Electronic Design, Jan. 21, 1982, at 179; Strecker, "Cache Memories for PDP-11 Family Computers," in Bell, "Computer Engineering" (Digital Press), at 263-67, all incorporated herein by reference.
In general, a direct mapped cache memory comprises a high speed data RAM and a parallel high speed tag RAM. The RAM address of each line in the data cache is the same as the low order portion of the main memory line address to which the entry corresponds, the high order portion of the main memory address being stored in the tag RAM. Thus, if main memory is thought of as 2.sup.m blocks of 2.sup.n "lines" of one or more bytes each, the i'th line in the cache data RAM will be a copy of the i'th line of one of the 2.sup.m blocks in main memory. The identity of the main memory block that the line came from is stored in the i'th location in the tag RAM. Tag RAM typically also contains a "valid" bit corresponding to each entry, indicating whether the tag and data in that entry are valid.
When a CPU requests data from memory, the low order portion of the line address is supplied as an address to both the cache data and cache tag RAMs. The tag for the selected cache entry is compared with the high order portion of the CPU's address and, if it matches, then a "cache hit" is indicated and the data from the cache data RAM is enabled onto the data bus. If the tag does not match the high order portion of the CPU's address, or the tag data is invalid, then a "cache miss" is indicated and the data is fetched from main memory. It is also placed in the cache for potential future use, overwriting the previous entry. Typically, an entire line is read from main memory and placed in the cache on a cache miss, even if only a byte is requested. On a data write from the CPU, either the cache RAM or main memory or both may be updated, it being understood that flags may be necessary to indicate to one that a write has occurred in the other.
In PC AT-compatible computers, the chipset performs all the management functions for the cache, while the cache data memory itself is located in SRAM off-chip. The tag memory is also located off-chip in a tag RAM. The user can specify, through a user setup program which programs registers in the chipset, which memory address ranges are to be cacheable and which are not. Due to the special nature of addresses A0000h-BFFFFh and C800h-FFFFFh, these addresses are never cacheable in a typical chipset.
On system power-up, the external cache data and tag RAM both contain random data, including in the valid bit. Unless special precautions are taken, therefore, one or more lines of random data in the cache erroneously may appear to the chipset to contain valid information. One solution to this problem might be to use a dedicated tag RAM chip which has a "flush" pin. The CY7B181 chip manufactured by Cypress Semiconductor Corp. is one such chip. The flush pin would be connected to the system reset line to force the tag RAM to invalidate all its entries before the first instruction fetch by the CPU. Dedicated tag RAM chips are expensive, however, and preferably avoided in PC AT-compatible computers.
In some chipsets, the problem is solved using standard Static Ram (SRAM) chips to store tag RAM. These chipsets power up with caching disabled, and special routines in the setup program, or in a driver, invalidate each cache tag entry before enabling caching. Since the tag RAM is not directly accessible by the CPU in PC AT architectures, however, this technique usually requires the provision of special registers in the chipset through which the accesses can be made. It also requires specialized setup program code to accomplish the flush, which is undesirable since industry standard BIOS ROMs generally cannot be used. The technique also imposes a small time delay in the boot procedure which it would be desirable to avoid.
In the 386 WT chipset, a dedicated tag RAM was used which included an "invalidate" input pin to clear the valid bit for the entry currently being addressed. The chipset itself included an "invalidate" output for connection to that pin, and the chipset solved the power-up cache-flush problem by powering up in a default state with caching disabled and including logic to activate the invalidate output whenever caching was disabled. Thus, when the POST performed its standard memory test operation, which included (among other things) reads from all the bytes in a memory address range much larger than the maximum allowed cache size of 256k bytes, all the tag RAM entries were invalidated automatically. This solution avoided the need for any special setup program routines, but still required the use of expensive dedicated tag RAM.