A typical processing system with video/graphics display capability includes a central processing unit (CPU), a display controller coupled to the CPU by a CPU local bus (directly and/or through core logic), a system memory coupled to the CPU local bus through core logic, a frame buffer memory coupled to the display controller via a peripheral local bus (e.g., PCI bus), peripheral circuitry (e.g., clock drivers and signal converters, display driver circuitry), and a display unit.
The CPU is the system master and generally provides overall system control in conjunction with the software operating system. Among other things, the CPU communicates with the system memory, holding instructions and data necessary for program execution, normally through core logic. Typically, the core logic is two to seven chips, with one or more chips being "address and system controller intensive" and one or more other chips being "data path intensive." The CPU also, in response to user commands and program instructions, controls the contents of the graphics images to be displayed on the display unit by the display controller.
The display controller, which may be, for example, a video graphics architecture (VGA) controller, generally interfaces with the CPU and the display driver circuitry, manages the exchange of graphics and/or video data between the frame buffer and the CPU and the display during display data update and screen refresh operations, controls frame buffer memory operations, and performs additional basic processing on the subject graphics or video data. For example, the display controller may also include the capability of performing basic operations such as line draws and polygon fills. The display controller is for the most part a slave to the CPU.
Generally, improvements in access time to any of the system memory resources will increase system performance. For example, reduction in the amount of time required by the CPU/core logic to access given data from system memory will allow more data to be accessed during a given time period. Alternatively, faster memory access provides additional time during which the CPU and/or core logic can perform other critical tasks. One particular instance where improved access times can substantially improve system performance is during retrieval of data from system memory for storage in cache.
Most PC systems include one or two levels of data cache for improving access time to data by the CPU. The "L1" cache is normally integral to the CPU chip and consists of 8 to 16 kilobytes of fast static RAM (SRAM). The "L2" cache (when provided) is normally off-chip (coupled to the CPU and the core logic by the CPU local bus) and typically consists of 256 k to 512 kilobytes of fast SRAM. The SRAMs of the cache memory have substantially faster cycle times than the DRAMs of the system memory (e.g. 7 to 10 nsecs for a random access to an SRAM cache versus 110 to 130 nsecs for a random access, or 40 nsecs for a page access, to the system memory DRAM). Therefore, blocks of data are read from the system memory and written into the cache in anticipation of the data needs of the CPU. This "encachement" is typically done by the operating system as a function such factors as the spacial and/or temporal locality of the data required by the CPU during a sequence of operations. If the CPU requires data for a given operation, and that data is already part of the encached block (i.e a "cache hit" occurs), it can be accessed much faster than from the system memory. By selecting latency and density ratios between the system memory and the cache memory to be on the order of 10 to 1, and depending on the partitioning of the system memory by the operating system, cache hits for reads to the cache can exceed 95%.
Thus, the need has arisen for improved methods and hardware for accessing memory. Such methods and hardware would be particularly useful in implementing unified system memory-frame buffer systems and for effectuating system memory-frame buffer data transfers in more conventional systems.
Additionally, as system memory granularity, band width and density demands and system clock speeds increase, the need to improve the efficiency of cache memories and cache memory operations has arisen. In particular, the need has arisen for circuits and methods for transferring data from the system memory to the cache during encachement operations.