A computing system, such as a personal computer, a work station, a laptop computer, a handheld computer, a personal digital assistant (PDA), a video game system, etc, is known to include a processing device (e.g., a central processing unit), associated memory (e.g., system memory), video graphics circuitry, and peripheral ports. The peripheral ports enable the central processing unit to communicate with peripheral devices such as printers, monitors, keyboards, joysticks, a mouse, etc. The video graphics circuitry processes graphics data, which it receives from the central processing unit, and/or video data to produce display data. The video graphics circuitry receives the video data from a video decoder operably coupled to receive video signals from video sources, such as television broadcast signals, cable signals, satellite signals, etc.
As is also known, the central processing unit and/or video graphics circuitry utilizes memory to store programming instructions, intermediate processed data, and processed data. To improve the speed of memory access, most computing systems include cache memory. Cache memory temporarily stores data from a larger, slower memory to allow the central processing unit and/or video graphics circuitry to access the data at higher speeds than if were to access the data from the memory. The higher speeds are due, at least in part, to the cache memory being smaller than the memory and to its being coupled to the central processing unit and/or video graphics circuitry via dedicated buses.
Utilization of cache memory is most effective when the data retrieved from the slower memory (e.g., the system memory) will be used repeatedly by the host processor (e.g., the central processing unit and/or video graphics circuitry). When the host processor needs access to data, a command is provided to a memory controller that retrieves the data from memory and provides it to the cache. This data transfer requires at least the same number of clock cycles as if the host processor were to access the memory directly. Once the data is in the cache memory, the host processor may access it in much less time than accessing the memory directly. As such, the less transferring of data between the cache and memory and the more the host processor utilizes the data within the cache, the more efficient the cache memory system.
To insure that the cache memory is utilized more efficiently, the cache memory is associated with the larger slower memory in predetermined manners. For example, a group of locations in the system memory is typically associated with a section of the cache memory. This association improves cache utilization efficiency in that typically when the host processor requires data, the data is stored in the same memory block or within a few memory blocks (e.g., the memory groupings). As such, by retrieving a memory block's worth of data, the host processor is most likely going to need the other data for subsequent operations. Once the host processor has completed its operations upon the data stored in cache, it flags the data to be flushed to the slower memory. Once the data has been flushed, if the host processor needs data contained within the particular memory block, it must be refilled from the slower memory to the cache. If the memory block is repeatedly filled and flushed from cache memory, a process that is known as thrashing, the cache memory system is not being utilized in an efficient manner.
To help reduce thrashing, some computing systems include a victim cache. The victim cache is a separate memory device that stores the memory block of data that has been flushed. As such, when a memory block has been flushed, it is not directly flushed to the slower memory, it is first flushed to the victim cache. (Note that the victim cache may be set up as a write-through cache such that the data is flushed to the victim cache and the slower memory at the same time.) The victim cache can be organized as a fully associative first-in/first-out buffer, and it will store the memory block data until it is full. When full, it flushes the oldest memory data block to the slower memory. If, while the victim cache is storing a memory block data that the host processor desires, the memory block data is provided from the victim cache to the cache. Note that the system memory controller coordinates the transfer of data between the cache and victim cache and that, to the host processor, the victim cache does not appear to exist.
While the victim cache reduces the latency encountered with thrashing, there is a processing issue once the victim cache is full and the cache has data to flush. For non-write-through victim caches, once the victim cache is full, it must be flushed before the cache can flush its data to the victim cache. As is known for this situation, the cache cannot be filled until it is flushed. Thus, the host processor is delayed for multiple cache flush operations before it can continue its processing with respect to the newly accessed data. Another issue with the victim cache, because it is fully associative, it utilizes physical addresses of the slower memory, which are significantly bigger than logical cache addresses of the cache. As such, the memory controller requires larger adders, multipliers, etc. to process the addresses for the victim cache than for the cache.
Therefore, a need exists for a method and apparatus that extends a physical cache to reduce latency issues related to thrashing without the limitations of victim cache.