Today's high speed processors can execute program code and process data at rates much faster than data can be retrieved from or stored in main memory. To reduce the time spent by the processor waiting to access memory, a high speed memory “cache” acts as an intermediary between the processor and main memory. A cache may have a controller and a memory component. The cache memory contains a copy of a subset of the data in the main memory. The cache controller responds to memory access operations from the processor and, depending on what data is in the cache memory, may quickly access the cache memory in order to complete the memory operation. If the cache is maintaining data necessary to respond to the memory access operation, the cache is able to respond more quickly to the operation than if the main memory needs to be accessed directly.
The cache controller, in addition to responding to memory access operations, maintains data in the cache, sometimes copying data from the main memory into the cache or writing back data from the cache into the main memory. The cache controller uses a mapping to keep track of which addresses of the main memory are “cached.” For example, a cache block (a memory unit of the cache) may be associated with an address in main memory. The cache controller may maintain a mapping that identifies associations between blocks of the cache and addresses in main memory. When a processor issues a memory access operation identifying an address in main memory, the cache controller can determine, based on the mapping, whether there is a block of cache memory associated with a portion of the main memory containing that address.
Because the cache is almost always smaller than the main memory, a cache algorithm is used to select what subset of the main memory is maintained in the cache. Various cache algorithms are known, but each generally has as a goal increasing the likelihood that a memory access operation can be completed using data in the cache. In practice, however, the cache algorithm is imperfect and operations for uncached addresses are received. When an operation on an uncached address is received, the cache controller may copy data from the address of main memory into the cache. If all of the blocks of cache memory are full, the cache controller may be said to remove some addresses from the cache by writing over the data in blocks associated with those addresses with data from other addresses. The controller may then change the mapping to show the new addresses corresponding to data in that block. The cache algorithm may set a priority for determining which addresses to keep or remove from the cache when more data is to be cached than there are free blocks in the cache to hold it. When data in the cache is replaced with data at another address in the main memory in this fashion, the cache is said to evict the lower priority cached address.
The cache, processor and main memory are routinely involved in memory transfer operations. Memory access operations typically involve transmitting data between the processor, cache and main memory over one or more communication buses. Transfer operations may be initiated by the processor in the course of executing software. Common memory transfer operations include copy, move, swap, and zero.
Other techniques for improving the efficiency of memory transfer operations are also known to reduce the load on the processor Programmed input/output (PIO) is a technology by which the processor may control the read and write operations needed to complete a memory transfer. Another technology is direct memory access (DMA). DMA allows hardware other than the processor to control the memory transfer operation. In both PIO and DMA operations, data may be communicated over a bus to which the processor is connected, which may slow operation of the processor as some of its operations may also require access to the processor bus and will contend for the bus bandwidth.