Embodiments of the present invention relate to using memory in a computer. In particular, the present invention relates to allocating a portion of computer-system memory as a cache and mapping the allocated memory portion as reserved memory that can not be accessed by the operating system.
With today""s high-performance microprocessors, a popular technique for using memory involves caching. Typically, a memory cache interposes a block of fast memory, for example, high-speed Dynamic Random Access Memory (xe2x80x9cDRAMxe2x80x9d), between the microprocessor and a main memory. A special circuit called a cache controller attempts to keep the cache filled with the data or instructions that the microprocessor is likely to need next. If the information the microprocessor requests next is held within the DRAM of the cache, it can be retrieved without wait states. If, however, the information is not held in the DRAM of the cache, then the information can only be retrieved with wait states.
The logical configuration of a cache involves how the memory in the cache is arranged and how it is addressed, that is, how the microprocessor determines whether needed information is available inside the cache. The microprocessor is not the only component that can benefit from caching. For example, the graphics card, the component that writes graphics to the screen, can also benefit from caching.
FIG. 1 is an overview of a prior-art memory allocation. This figure shows that caches differ in the way they treat writing to memory. Most caches make no attempt to speed up write operations. Instead, they push write commands through a cache immediately, writing to cache and main memory at the same time. This write-through cache design guarantees that main memory and cache are in constant agreement. There is a faster alternative, however, called a write-back (xe2x80x9cVWBxe2x80x9d) memory 101. This WB memory 101 allows the microprocessor 100 to write changes to its cache memory and then immediately go back about its work. In FIG. 1, the microprocessor 100 has two cache memories, an on-die level 1 cache 106 which is integrated with the microprocessor 100, and a level 2 cache 107 that is external to the microprocessor 100.
One problem with the WB memory 101 is that a main memory 102 and the microprocessor level 1 and level 2 cache memories 106 and 107, respectively, can have different contents assigned to the same memory locations. The level 1 and level 2 caches 106 and 107, respectively, of the microprocessor 100 must constantly be checked to ensure that the contents of the main memory 102 properly track any changes made in the level 1 and level 2 caches 106 and 107, respectively. This constant checking is called xe2x80x9csnooping,xe2x80x9d and slows performance. The overhead associated with snooping is called xe2x80x9clatencyxe2x80x9d and latency reduces the performance of the machine. For example, if the microprocessor 100 asks for one megabyte (xe2x80x9cMBxe2x80x9d) of data that is marked as WB memory, chipset 104 will check both the WB memory 101 and the level 1 and level 2 caches 106 and 107, respectively, to see if the memory in WB memory 101 is up to date.
An existing solution to this problem is to use what is called a write-combining (xe2x80x9cWCxe2x80x9d) memory 105 as shown in the prior art system in FIG. 1. The WC memory 105 is a weakly ordered memory type in which system memory locations are not cached and coherency is not enforced by the processor""s bus-coherency protocol. When data is requested from the WC memory 105, the chipset does not snoop; that is, the chipset does not check to see if the memory is up to date, it simply reads the information stored in the WC memory 1 OS. However, a problem with WC memories is that WC memories are not always large enough to be useful to and are not always available to the various device drivers running on the computer.
In FIG. 1, a graphics accelerator/video controller 103 is shown, and is one of the most important chips on a video board. Graphics accelerators/video controllers can be designed to use standard DRAM, dual-ported video random access memory (xe2x80x9cVRAMxe2x80x9d), or either type. While VRAM memory delivers better performance, it is more expensive than DRAM. Therefore, if performance can somehow be enhanced using DRAM, for example, by enabling the graphic accelerator/video controller 103 to store and retrieve information directly from a high speed DRAM cache in main memory 102, a machine will be able to provide better performance at a lower price.
Therefore, it can be appreciated that a substantial need exists for a system and method which can advantageously use computer system memory DRAM as a high speed cache that is accessible to the various device drivers running on the computer system.
To overcome the problems in the prior art, a system and method is introduced for allocating system memory as a cache that is accessible to various peripherals. In one embodiment of the present invention, an amount of system memory is allocated for use as a Direct Memory Access (xe2x80x9cDMAxe2x80x9d) buffer. The allocated memory is mapped as write combining, and the write combining memory is made available to a device driver.