Referring to FIG. 1, a system 100 containing high-performance central processing units (CPUS) 102, 104, 106 will usually provide a cache memory 112, 114, 116 for each CPU. The system may also include other types of processors, such as an input/output processor 118. A cache memory increases the CPU's performance by satisfying most of the CPU's memory references, instead of requiring a reference to main memory 120 for every reference made by the CPU. Since the access time of the cache (e.g., 10 nanoseconds) is usually much less than the access time of main memory (e.g., 400 nanoseconds), performance is increased.
In a multiprocessor system in which a number of processors 102-106 and their caches 112-116 share a common memory bus 122, the caches also serve to shield the bus from the memory traffic generated by the CPUS. A "write back" strategy, which returns a cache block to main storage 120 only when the cache block is needed for another address, is a particularly effective method of reducing bus traffic.
With the availability of dual-ported dynamic memories, commonly known as video RAMS, it has become straightforward to build frame buffers 124, 126 that place the display pixelmap (also known as a bit map) in the physical address space of one or more CPUS. The video RAMs in such frame buffers have a serial port, which is used to refresh a raster-scanned monitor 130 or 132, and a parallel port used by the CPUs for updating the image data stored in the video RAMs. Using the CPU to update the contents of the frame buffers can represent a substantial savings over using specialized hardware, and the rate at which updates to an image can be computed and stored in a frame buffer is quite respectable using currently available high performance CPUs.
When a write-back cache is used in conjunction with a memory-mapped frame buffer, three problems can occur.
The first problem with using a write-back cache in conjunction with a frame buffer is that data values in the cache are not written back to the frame buffer until the cache block holding the frame buffer data is needed to hold some other block of data. Thus changes to the image on the display may be delayed for an arbitrary amount of time after the pixelmap is modified by the CPU. In other words, the displayed image may not reflect the computed image data for an unpredictable period of time.
The second problem is that caches frequently fetch an entire block of information from main memory whenever the cache does not contain a referenced address, even when the operation issued by the CPU is a write operation. For normal programs, this is a good strategy, since most locations are read before they are written. In a frame buffer, this is frequently not the case, and locations are often written without being read first. In this case, the data fetched into the cache will be overwritten immediately, so the fetch represents wasted work.
The third problem concerns the tendency of frame buffer data to displace other data blocks needed in the cache. In a direct mapped cache the data stored in a particular address in main memory can be stored in only one location in the cache. Direct napped caches are frequently used because they are effective and lower cost than other cache mapping organizations. Unfortunately, when a direct mapped cache is used with a frame buffer, overall system performance may be severely degraded. The reason for this is that the references made by a CPU to a frame buffer may not exhibit the spatial and temporal locality of normal program references. In particular, it is often the case that a long run of sequential frame buffer locations will be referenced, with each location being referenced exactly once. The result of this is that a direct mapped cache will become filled with display data, which will displace other cache information, including the data and program text of the program that modified the display image. This displacement will cause the number of cache misses to increase substantially, increasing the average access time seen by the CPU and lowering the system's performance.
The standard, prior art solution to the above problems is to operate frame buffers in an uncached portion of the system's address space. This means that such systems cannot take any advantage of the presence of a cache for the processing of frame buffer (i.e., image) data.
The present invention addresses each of the three problems listed above. By making modifications to the design of the cache and the frame buffer, it provides a design that supports frame buffers more effectively than previous arrangements.