As is evident in the field, modern high performance data processing systems are conventionally implemented using single-chip microprocessors as the central processing units (CPU), and using semiconductor random-access memory (RAM) as main system memory. The main memory is generally implemented in the form of dynamic RAM (DRAM) devices, which are of high density and low cost-per-bit; however, the access and cycle times of conventional DRAM memory are relatively slow, and are not able to keep up with the clock rates of modern microprocessors.
Conventional microprocessor-based data processing systems have addressed the performance limitations of main memory access, while still obtaining the low-cost benefit of high-density DRAM, through the use of cache memories. Cache memories are typically small blocks of high speed static RAM (SRAM), either on-chip with the microprocessor or off-chip (or both), for storing the contents of memory locations that are likely to be accessed in the near future. Typically, cache memory will store the contents of memory locations that are near neighbors to a memory location that was recently accessed; because microprocessors often access memory in a sequential fashion, it is likely that successive memory accesses in successive cycles will access memory addresses that are very close to one another in the memory space. Accordingly, by storing the neighboring memory location contents in a cache, a good portion of the memory accesses may be made by the microprocessor to cache, rather than to main memory. The overall performance of the system will thus be improved through the implementation of cache memory. Some modern microprocessors include multiple levels of cache memory, with the capacity of the cache increasing (and its speed decreasing) with each successive level, to optimize performance. Intelligent cache design and implementation can greatly improve system performance by minimizing accesses to main DRAM memory.
Another approach toward improving memory access performance in microprocessor-based systems is the use of special memory access cycles, commonly referred to as "burst" access cycles. Burst memory access cycles are used, in the operation of the memory devices, to provide access to a series of memory locations. Typically, the burst access is effected by way of a memory controller chip placed between the microprocessor and main memory, and operates in response to the address information and control signals presented by the microprocessor. Burst cycles are highly effective in improving the performance of memory accesses. For example, in a modern system having an eight-byte bus, a burst cycle can access thirty-two bytes of memory with the presentation of a single memory address in as few as five bus cycles (2-1-1-1), when using a best case cache. Burst access is also highly efficient using page mode DRAM, in which a thirty-two byte access may be performed in a bus cycle sequence of 8-3-3-3 (totaling seventeen bus cycles), and using special DRAM functions such as Early Data Out (EDO) and synchronous DRAM, in which thirty-two byte burst accesses may be accomplished, in the best case, with a bus cycle sequence of 6-1-1-1 (totaling nine bus cycles). This is a drastic improvement over the non-burst case in which access of a 32-byte line requires 64 cycles when accessed as a group of eight separate 4 byte reads (considering that non-burst accesses are generally not longer than 4 bytes). As such, burst mode memory access is typically twice to six times as fast as non-burst cycles.
In microprocessors utilizing the well-known "x86" architecture, including the so-called "Pentium-class" microprocessors (referring to microprocessors having functionality and instruction set compatibility with PENTIUM microprocessors available from Intel Corporation), burst memory access are linked to cache operations. In other words, in these x86-architecture microprocessors, burst memory operations are performed only in connection with cache line fill operations (reads from memory) and cache write-back operations (writes to memory). Given the cache architecture of these microprocessors, where most data and instruction retrieval is accomplished by way of cache memory, the performance provided by performing burst memory accesses for cache operations is quite high.
Caching typically works quite well for "true" memory locations, to and from which only the microprocessor writes and reads data using conventional memory access operations, because the microprocessor can ensure that its cache copy of the memory location matches the copy in main memory. So long as the cache and main memory copies of the same memory locations are the same, reading of the cache copy instead of the main memory copy will have no side effects. However, certain memory locations, such as those containing the status of an I/O device or those portions of the screen buffer that may be changed by a graphics accelerator, are volatile to the extent that cache copies of these memory locations would be frequently out-of-date. The reading of a cache copy of these volatile memory locations, in lieu of the main memory locations, could have significant side effects in system operation. Accordingly, accesses by the microprocessor to these volatile locations are blocked from being "cacheable" (i.e., from being stored in cache memory) in conventional IBM PC architecture systems, typically by the operation of a memory controller.
For example, the memory mapped register is generally a blocked area in the cacheable access sense, despite being accessed via conventional memory access, because the memory mapped register is often polled to detect changes in device status, responsive to which certain control functions are effected. If the memory-mapped register were cached, changes in device status would be reflected in the main memory copy of the memory-mapped register but not in the cached copy; periodic polling of the memory-mapped register would read the cache copy only, and would therefore not detect the sought-for change in device status, effectively bringing control to a standstill. By way of another example, the caching of non-memory devices such as memory-mapped I/O functions may cause additional side effects for those types of I/O devices which change state in response to a read operation on the bus, since reads of on-chip cache memory do not appear as bus cycles. Write-back caching also presents side effects for these non-memory locations, as the cache could contain a more up-to-date copy than main memory; since writes to write-back cache do not appear on the bus, the caching of these locations would appear to reorder writes performed on the bus.
Another example of a memory area that is volatile and therefore typically blocked from cacheable access is video memory, which is logically within the memory map of the microprocessor and physically located either within or separate from main memory (such as in a graphics adaptor). Video memory is often under the control of a device other than the microprocessor, such as a graphics processor or graphics adaptor, and is therefore not suitable for cacheable access by the main microprocessor as its contents are frequently changed outside of the control of the microprocessor. If a portion of video memory were to be stored in the microprocessor cache, the cache contents would likely be invalid for subsequent accesses because of the changes made by the graphics processor.
According to conventional x86-architecture microprocessors, therefore, burstable memory accesses are linked to the cacheability of the memory location to be accessed. For example, the PENTIUM microprocessor requests a burstable memory access by asserting a control signal at terminal CACHE# (the # indicating that the signal is active at a low logic level) during an access to memory (indicated by the microprocessor presenting a high logic level at terminal M/IO#). Responsive to this request, the memory controller determines if the memory address presented by the microprocessor is in a cacheable area of the memory space and, if so, asserts the KEN# input to the microprocessor and effects the burst access. According to this conventional implementation, if the microprocessor requests a burstable access to an area of memory that is blocked from cacheable access, the memory controller will not effect a burstable access, and will indicate the same by deasserting KEN#. Single transfer access to the desired memory location will then proceed.