The invention relates generally to cacheable memory within computer systems and more particularly to providing cacheable data to a peripheral device.
Many computer systems include a processor, memory unit, and a graphics controller. The processor performs computations on data that the processor accesses in the memory unit. The graphics controller obtains data from the memory unit and displays the data on, for example, a computer monitor. Many computer systems increase processor computation speed by providing a cache between the processor and the memory unit. The cache is a high-speed random access memory (RAM) that the processor accesses more quickly than the processor can access the memory unit.
A computer system implementing a write-through cache scheme provides data from the processor directly to the memory unit, and then fills caches with data from the memory unit. The memory unit therefore always contains a recent version of the data. Such systems are generally slower than computer systems implementing a write-back or copy-back cache scheme. Computer systems implementing a write-back or copy-back cache scheme provide data from the processor directly to the cache or caches, and then write data from the caches to the memory unit only when it is necessary. Although the memory unit does not always contain a recent version of the data, the computer system can generally operate more quickly in a write-back cache scheme, since the processor can access data almost immediately after writing the data.
The cache typically includes data that the processor has written recently, organized into cache lines, limited by the capacity of the cache. A cache typically includes several cache lines, each cache line holding 32 bytes of data. Each cache line corresponds to a small contiguous range of addresses within the memory unit. A cache controller, typically found within the processor, manages cache lines within the cache.
In some computer systems, caches and even cache lines have attributes. Some attributes indicate whether a particular cache, or a particular cache line, is to be considered write-back or write-through. For example, in some computer systems, some caches (or some cache lines within a cache) within a computer system may be write back, while other caches or cache lines are write through caches. Other attributes indicate a cache state of a cache or cache line. Cache states include, for example, the MESI (xe2x80x9cModified/Exclusive/Shared/Invalidxe2x80x9d) state or xe2x80x9cdirtyxe2x80x9d state of a cache line in snoopable multi-cache computer systems. Attributes are typically stored either in a portion of the cache itself or in a portion of the memory that is reserved for the cache.
Attributes are typically set upon configuration. When power is initially applied to the computer system, the computer system initiates configuration and executes a basic input/output system (BIOS) including a power-on self-test (POST), and then launches an operating system. The BIOS and operating system include routines that determine what resources are available within the computer system, and create files in the memory that allow the computer system to identify and use the resources. Conventionally, the BIOS or operating system sets cache attributes and memory attributes during this configuration execution.
xe2x80x9cPrefetchingxe2x80x9d is a mechanism for making data available to a processor before the processor requests the data. The cache controller prefetches data by copying the data from the memory unit into the cache line. Whenever the processor accesses data in a cache, prefetching fills the remaining locations within the cache line with data from nearby locations. According to the well-known principles of temporal and spacial locality, a processor that accesses data in one location is soon going access other data stored in a nearby location. Prefetching reads data from memory unit into a cache line whenever the processor accesses the cache line. Prefetched data is immediately available to the processor without additional memory unit access delays.
When the processor writes to the cache, the cache controller determines whether the address of the data falls within any of the ranges of data addresses corresponding to any of the cache lines already residing within the cache. If the data falls within such a range of addresses, the data to be written immediately replaces the data within the appropriate cache line: located within the cache. If the address does not fall within such a range of addresses, then the cache controller will firstly, fetch the appropriate portion from the memory unit to create the cache line within the cache. Secondly, the new data to be stored then replaces the data within the new cache line (that has been prefetched).
If the cache is already saturated with data and has no available cache locations, then the cache controller pushes a data line out of the cache and reallocates the cache line for the processor data and the prefetched data. Different computer systems use different algorithms for selecting the cache line. The pushed cache line is copied into either another cache or to the memory unit.
Processors can read data from their caches much more quickly than they can access the memory unit. In systems with multiple caches, cache controllers can snoop the memory unit bus for transactions that affect data contained within the cache and update their local copies of data accordingly. Cache incoherency can arise when a processor has a more current value in a local cache and some other peripheral or processor sees a different or xe2x80x9cstalexe2x80x9d value. Cache incoherency is not a problem unless a processor needs data that is only stored in caches with which the processor does not share a bus. When a processor does not share a bus with a cache containing data that the processor needs, the cache controller generally must flush the data to the memory unit.
In high performance computer systems that employ a write-back or copy-back caching algorithm, the computer system only writes data to the memory unit when the caches are all full or when a processor needing data in the cache does not share a bus with the cache containing the data. Otherwise, cache incoherency can develop. A cache line may become incoherent if its data has not been copied into the memory unit and some other unit within the computer system is requesting the same data. Cache flushing forces the cache controller to copy data from the cache into the memory unit, but it is used sparingly because writing to the memory unit is a time-consuming operation.
A graphics controller generally contains a processor or other bus master that requires data from the processor. The graphics controller is often implemented on a separate card, such as an expansion card, and frequently operates in parallel with the processor but at a different speed or synchronization. Accordingly, the graphics controller generally cannot share a common bus with the processor, due the complexity of interconnections that would be necessary. The complexity of providing interconnections to allow the graphics controller to snoop the processor bus is prohibitive. To provide data to the graphics controller, the processor generally stores the data in the memory unit. The graphics controller obtains graphics or drawing command data directly from the memory unit, or across a bridge. Bridges such as the Intel Northbridge couple the processor (or caches) on a processor bus, the memory unit on a memory bus, and a graphics controller on a graphics bus or peripheral components interconnect (PCI) bus.
Many computer systems use memory mapping to address peripheral devices such as a graphics controller. In such computer systems, the processor may attempt to write to addresses that do not exist in memory. The computer system has a hardware implementation that routes such write commands to the appropriate peripheral devices. When the peripheral device is busy or operates asynchronously with the processor, the processor may store the data in a reserved portion of the memory unit. The memory unit of such computer systems is divided into portions, each portion of the memory unit being reserved for a distinct one of the peripheral devices.
Thus, consider an example of a computer system where the processor will need to supply graphics data and drawing commands to a graphics controller via the memory unit, and subsequently, the graphics controller will asynchronously request data from the same memory unit. The processor is burdened with making sure graphics data and drawing commands are kept coherent with data that is potentially located within a cache. Some computer systems solve this problem by forcing the portion of memory that is accessed by a peripheral device (in this example, a graphics controller) to be un-cacheable. This results in poor processor bus utilization and lower efficiency since there are numerous small transactions that occur. Other computer systems solve this problem by forcing the software (associated with the peripheral device, often called a driver) to issue a cache flushing instruction upon completing a cache line. This results in additional overhead during memory write transactions and can temporarily stall the processor during time critical operations. Another problem with this scheme is that the cache will eventually become saturated with graphics data and drawing commands that will not be referenced by the processor again.
Consequently, there is a need for a computer system that:
(a) promptly makes the most recent version of the graphics data or drawing commands available to the graphics controller
(b) achieves a high rate of data transfer,
(c) does not unnecessarily impose on the processor