1. Field of the Invention
The invention pertains generally to computer systems. In particular, it pertains to a write cache for writing data to memory.
2. Description of the Related Art
Because processors can typically operate at much faster speeds than their main memory, most computer systems now use high-speed cache memory as local memory that the processor can access for most of its needs. However, although cache memory is fast, it is also much more expensive than the dynamic random access memory (DRAM) typically used for main memory, and the amount of available cache memory is typically only a fraction of the amount of DRAM memory in the system. Since much software involves repetitive execution of the same code, it is feasible to copy the code about to be executed from main memory into cache memory, where it can then be repetitively executed at high speed. Because copying from a slower memory also takes time, many computer systems have a hierarchy with multiple levels of cache, with each subsequent level being faster and smaller than the one below it, and main memory at the bottom of the hierarchy.
Whenever a processor (CPU) or other device executes a write function, it is changing the contents of one or more memory locations. Due to the cached memory structure, this change happens first in the cache memory from which the processor is executing. This data must then be updated in main memory (and any lower levels of cache memory) to maintain consistency and preserve the change for future use. Since burst transfers are generally more efficient overall than individual word or byte transfers, the data is written back to main memory in blocks of predetermined size, with each block containing whatever changes were made to the data in that block.
Many conventional systems employ write-through cache. In a write-through cache memory system, each time data is written (i.e., changed) into cache, the changed cache line is written back to memory so that cache and main memory will be in agreement and other devices reading the changed memory location will not be reading xe2x80x9cstalexe2x80x9d data that is no longer correct. This is typically done by writing each changed block of data to a buffer, or queue, from where it can be written back to memory as the competing demands on the memory system allow.
A conventional system 10 is shown in FIG. 1. A CPU 11 is closely coupled to a cache memory 12, which contains the code and data currently being executed and also the code and data that was recently executed. Data that has been written to cache is also written to main memory 13 by transmitting it to I/O control logic 14, from where it is placed into write queue 16 to await its turn to be written into memory 13. As it exits write queue 16, the appropriate address and data signals are presented to memory controller 18, which opens the page in memory 13 and writes the data to the selected locations within that page. Graphics controller 17 can also read and write data to memory, as can multiple devices on the input-output (I/O) buses interfaced to bus controller 15, so I/O control logic 14 arbitrates the write requests from these various sources and places them into write queue 16.
Since multiple devices can try to write data to main memory 13 at the same time, write queue 16 allows the memory system to collect these competing memory requests, but it does nothing to change the order or grouping of the data being written to memory. There are several deficiencies in this conventional process:
1) Since the various sections of memory that are being changed may belong in scattered pages of memory, multiple pages of memory must be sequentially opened and closed. Opening a page of memory is time-consuming; sequentially opening several can significantly affect the efficiency of memory operations.
2) Writing several blocks of data into a page separately is inefficient. But the order in which those blocks are accepted and written is somewhat random, and conventional systems have no mechanism to save up a group of related blocks and organize them for a smaller number of burst transmissions.
3) Various parts of the data in a single cache line may be written at different times. Initiating a separate block of writes for each one is inefficient, but conventional systems have no mechanism to collect separate partial writes to the same cache line for a single burst transmission.