The present invention relates generally to digital computers and, more specifically, to cache memory management in digital computers.
As the performance demands on digital computers continue to increase at a meteoric pace, processors have been developed which operate at higher and higher clock speeds. The instruction sets used to control these processors have been pared down (e.g., RISC architecture) to make them more efficient. Processor improvements alone, however, are insufficient to provide the greater bandwidth required by computer users. The other computer subsystems which support the processor, e.g., I/O devices and memory devices, must also be designed to operate at higher speeds and support greater bandwidth. In addition to improved performance, cost has always been an issue with computer users. Thus, system designers are faced with the dual challenges of improving performance while remaining competitive on a cost basis.
Cache memory systems were designed with these competing goals in mind. If the processor must wait for a memory system to access data, then the memory system becomes a bottleneck and reduces system efficiency. The ready solution of devising and incorporating the fastest possible memory devices for the entire digital computer memory is rather uneconomical due to the large amounts of memory used into today's digital computers and because, typically, the faster the memory device, the greater the cost of that device. Cache memories are essentially high-speed buffers for holding data which provide an interface between the processor and the main memory. By adding a cache memory between a fast processor and a slower (than the cache) memory system, a designer can provide an apparently fast memory at an affordable cost.
This ability of a cache system stems from a general tendency of many programs to access data and program instructions that have been recently accessed or those which are located in nearby memory locations. If the processor needs data that is not resident in the cache, a cache "miss", it accesses the main memory array. The data fetched from the main memory array then replaces some of the data in the cache with the expectation that it will be needed again soon. Properly implemented, the rate at which data is found in the cache, the cache "hit" rate, can be in excess of 90% of all accesses depending upon the type of software and data structures being implemented.
Cache memories are commonly divided into two sections, a data storage section which holds and delivers the data (for example, a high speed storage device such as SRAM) and a tag storage section which stores the corresponding main memory addresses of data stored in the data storage section. When the processor initiates a memory read operation, the processor sends the physical address associated with the memory access to a cache controller which internally latches the address for operation. The cache controller compares the physical address in the internal latch with the tags which are currently stored in the tag storage section. If the cache controller finds a match, then a cache hit has occurred and the corresponding datum is retrieved from the data storage section and forwarded to the processor. If the cache controller does not find a match, then a cache miss has occurred and the corresponding datum is retrieved from main memory, forwarded to the processor and stored in the cache data storage section. Since the cache memory is very fast, cache hits take less time to process than retrieving the data directly from main memory. Cache misses, on the other hand, take longer to process than retrieving data directly from main memory. The additional latency for cache misses is referred to as the miss penalty.
To generalize, memory write operations can occur in two different ways. If the location in memory being written has a corresponding copy in the cache, then the cache updates its copy of the datum. The cache can either concurrently forward the datum to the main memory array (a "write-through" cache) or it can wait until later to update the main memory (a "write-back" or "copy-back" cache). The write-through scheme provides cache coherency with the main memory array, assuming that all memory transactions are handled in the same way, since the data in both the cache and corresponding locations in the main memory will be the same. On the other hand, the copy-back scheme provides some advantages in terms of speed since the number of write operations to the slower main memory is reduced, but a monitoring scheme is needed to resolve cache incoherences.
Many systems also provide for software control over the cache memory. For example, it may be desirable in certain cases to define areas of main memory as being noncacheable. In particular, this may be desirable for areas of memory which are used in ways which do not follow the tendency described above of repeated access to the same or nearby memory locations, i.e., the types of memory accesses which would not benefit from searching the cache memory and which would instead be subject to the miss penalty. An example is main memory areas which are used to hold blocks of data. If these memory areas were cacheable, then reusable instructions might be replaced in the cache by unreusable data, thereby degrading cache performance. The software can control usage of the cache memory by, for example, declaring one or more ranges of addresses in the main memory to be noncacheable. For example, FIG. 1 shows an exemplary memory map wherein blocks 00-1A are cacheable, blocks 1A-1C are noncacheable, and the remaining blocks are cacheable. This memory map can be used by the processor to, for example, set an inhibit bit in a register which is used to control memory access modes. For example, if an address to be asserted on the address bus is found in a noncacheable region of the memory map, then the processor sets the inhibit bit equal to one and the memory access will be completed by referencing that address in the main memory, completely bypassing the cache. Thus, the accessed location is not loaded into the cache (if the access is a write operation) nor is the location allocated in the cache (if the access is a read-miss operation). Similarly, when the cache inhibit bit is set, copies of accessed data currently in the cache are not updated, flushed, or invalidated.
By providing software with the opportunity to define its own regions of noncacheable main memory, cache performance can be optimized by only looking to the cache for ranges of main memory which are relatively more likely to be reaccessed. However, the designation of memory ranges as cacheable or noncacheable can be changed by the software dynamically or, when another program is loaded, a new memory map may be created. When this occurs, the portion of memory which was earlier declared as noncacheable, for example blocks 1A-1C in FIG. 1, may now be declared as cacheable memory. However, the cache controller will have no idea whether or not images within this range which are currently stored in the cache changed while this range of addresses was noncacheable. Accordingly, the cache controller will have to invalidate all the current tag entries for at least this range of addresses and, possibly, the entire cache memory.