The present embodiments relate to microprocessors, and are more particularly directed to microprocessor circuits, systems, and methods with a class categorized storage circuit for storing non-cacheable data.
As is evident in the field, modem high performance data processing systems are conventionally implemented using single-chip microprocessors as the central processing units (CPU), and using semiconductor random-access memory (RAM) as main system memory. The main memory is generally implemented in the form of random access memory (RAM) devices such as dynamic RAM (DRAM), which are of high density and low cost-per-bit; however, the access and cycle times of conventional DRAM memory are relatively slow, and are not able to keep up with the clock rates of modem microprocessors.
Conventional microprocessor-based data processing systems have addressed the performance limitations of main memory access, while still obtaining the low-cost benefit of high-density DRAM, through the use of cache memories. Cache memories are typically small blocks of high speed static RAM (SRAM), either on-chip with the microprocessor or off-chip (or both), for storing the contents of memory locations that are likely to be accessed in the near future. Typically, cache memory stores the contents of memory locations that are near neighbors to a memory location that was recently accessed; because microprocessors often access memory in a sequential fashion, it is likely that successive memory accesses in successive cycles will access memory addresses that are very close to one another in the memory space. Accordingly, by storing the neighboring memory location contents in a cache, a good portion of the memory accesses may be made by the microprocessor to cache, rather than to main memory. The overall performance of the system is thus improved through the implementation of cache memory. Some modem microprocessors include multiple levels of cache memory, with the capacity of the cache increasing (and its speed decreasing) with each successive level, to optimize performance. Intelligent cache design and implementation can greatly improve system performance by minimizing accesses to main memory.
Another approach toward improving memory access performance in microprocessor-based systems is the use of special memory access cycles, commonly referred to as "burst" access cycles. Burst memory access cycles are used, in the operation of the memory devices, to provide access to a series of memory locations. Typically, the burst access is performed by way of a memory controller chip placed between the microprocessor and main memory, and which operates in response to the address information and control signals presented by the microprocessor. Burst cycles are highly effective in improving the performance of memory accesses. For example, in a modern system having an eight-byte bus, a burst cycle can access thirty-two bytes of memory with the presentation of a single memory address in as few as five bus cycles (2-1-1-1), when using a best case cache. Burst access is also highly efficient using page mode DRAM, in which a thirty-two byte access may be performed in a bus cycle sequence of 8-3-3-3 (totaling seventeen bus cycles), and using special DRAM functions such as Early Data Out (EDO) and synchronous DRAM, in which thirty-two byte burst accesses may be accomplished, in the best case, with a bus cycle sequence of 6-1-1-1 (totaling nine bus cycles). This is a drastic improvement over the non-burst case in which access of a 32-byte line requires 64 cycles when accessed as a group of eight separate 4 byte reads (considering that non-burst accesses are generally not longer than 4 bytes). As such, burst mode memory access is typically twice to six times as fast as non-burst cycles.
In microprocessors utilizing the well-known "x86" architecture, including the so-called "Pentium-class" microprocessors (referring to microprocessors having functionality and instruction set compatibility with PENTIUM microprocessors available from Intel Corporation), burst memory access are linked to cache operations. In other words, in these x86-architecture microprocessors, burst memory operations are performed only in connection with cache line fill operations (reads from memory) and cache write-back operations (writes to memory). Given the cache architecture of these microprocessors, where most data and instruction retrieval is accomplished by way of cache memory, the performance provided by performing burst memory accesses for cache operations is quite high.
Caching typically works quite well for "true" memory locations, to and from which only the microprocessor writes and reads data using conventional memory access operations, because the microprocessor can ensure that its cache copy of the memory location matches the copy in main memory. So long as the cache and main memory copies of the same memory locations are the same, reading of the cache copy instead of the main memory copy will have no side effects. However, certain memory locations, such as those containing the status of an I/O device or those portions of the screen buffer that may be changed by a graphics accelerator, are volatile to the extent that cache copies of these memory locations would be frequently out-of-date. The reading of a cache copy of these volatile memory locations, in lieu of the main memory locations, could have significant side effects in system operation. Accordingly, accesses by the microprocessor to these volatile locations are blocked from being "cacheable" (i.e., from being stored in cache memory) in conventional IBM PC architecture systems, typically by the operation of a memory controller.
For example, the memory mapped register is generally a blocked area in the cacheable access sense, despite being accessed via conventional memory access, because the memory mapped register is often polled to detect changes in device status, responsive to which certain control functions are effected. If the memory-mapped register were cached, changes in device status would be reflected in the main memory copy of the memory-mapped register but not in the cached copy; periodic polling of the memory-mapped register would read the cache copy only, and would therefore not detect the sought-for change in device status, effectively bringing control to a standstill. By way of another example, the caching of non-memory devices such as memory-mapped I/O functions may cause additional side effects for those types of I/O devices which change state in response to a read operation on the bus, since reads of on-chip cache memory do not appear as bus cycles. Write-back caching also presents side effects for these non-memory locations, as the cache could contain a more up-to-date copy than main memory; since writes to write-back cache do not appear on the bus, the caching of these locations would appear to reorder writes performed on the bus.
Another example of a memory area that is volatile and therefore typically blocked from cacheable access is video memory, which is logically within the memory map of the microprocessor and physically located either within or separate from main memory (such as in a graphics adapter). Video memory is often under the control of a device other than the microprocessor, such as a graphics processor or graphics adapter, and is therefore not suitable for cacheable access by the main microprocessor as its contents are frequently changed outside of the control of the microprocessor. If a portion of video memory were to be stored in the microprocessor cache, the cache contents would likely be invalid for subsequent accesses because of the changes made by the graphics processor.
According to conventional x86 architecture microprocessors, therefore, burstable memory accesses are linked to the cacheability of the memory location to be accessed. For example, the PENTIUM microprocessor requests a burstable memory access by asserting a control signal at terminal CACHE# (the # indicating that the signal is active at a low logic level) during an access to memory (indicated by the microprocessor presenting a high logic level at terminal M/IO#). Responsive to this request, the memory controller determines if the memory address presented by the microprocessor is in a cacheable area of the memory space and, if so, asserts the KEN# input to the microprocessor and effects the burst access. According to this conventional implementation, if the microprocessor requests a burstable access to an area of memory that is blocked from cacheable access, the memory controller will not effect a burstable access, and will indicate the same by deasserting KEN#. Single transfer access to the desired memory location will then proceed.
By way of further background, still another consideration in the complexity in cache architectures is the existence and sometimes requirement of snoop capability. Snooping is known in the art, and generally includes two different types of snoop requests, each of which is associated with one or more memory locations identified by an explicit address that accompanies or corresponds to the snoop request. For example, a snoop request may be issued including an address, where the entire cache line which includes the addressed information is to be operated upon in response to the snoop request. In any event, as to the two types of snoop requests, generally a first type of such a request indicates to a cache (or caches) that the requesting circuit seeks to share the addressed information. If a cache does not have a copy of the addressed information, then it simply takes no action with respect to the request. On the other hand, if the cache does have a copy of the addressed information and that information has been modified since the time it was stored in the cache, then the cache outputs the addressed information to main memory. Therefore, the requesting circuit may then read the addressed information from main memory (or "snarf" it from the bus as it is being written from the cache to the main memory). The second type of snoop request indicates to a cache (or caches) that the requesting circuit seeks the addressed information and will change that information. Once again, if a cache does not have a copy of the addressed information, then it simply takes no action with respect to the request. On the other hand, if the cache does have a copy of the addressed information and that information has been modified since the time it was stored in the cache, then the cache also outputs the addressed information to main memory. However, note that because the requesting circuit will change the information, the outputting cache also must invalidate its own copy of the information to prevent subsequent use of information that has been changed.
Given the existence of snooping as introduced above, note also that it may further limit the types of information that are considered cacheable under current architectures. For example, assume there is a device which is external from a microprocessor and operates to alter data of the main memory used by the microprocessor. Assume further that this external device does not provide snooping capability. Therefore, the types of data associated with this external device are typically deemed non-cacheable because to allow otherwise would provide the danger that the type of data altered by this external device would exist in a cache on the microprocessor and would not be properly output by the cache to the main memory because of the lack of snooping capability of the device. Thus, snooping provides yet another complexity in the consideration of cache techniques and limitations.
In view of the above, the present inventors have recognized various limitations of the above factors regarding cacheability. Thus, below are presented various inventive embodiments which improve performance as measured against these prior art drawbacks.