Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both microprocessors—the “brains” of a computer—and the memory that stores the information processed by a computer.
In general, a microprocessor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a “memory address space,” representing the addressable range of memory addresses that can be accessed by a microprocessor.
Both the instructions forming a computer program and the data operated upon by those instructions are often stored in a memory system and retrieved as necessary by the microprocessor when executing the computer program. The speed of microprocessors, however, has increased relative to that of memory devices to the extent that retrieving instructions and data from a memory can often become a significant bottleneck on performance. To decrease this bottleneck, it is desirable to use the fastest available memory devices possible. However, both memory speed and memory capacity are typically directly related to cost, and as a result, many computer designs must balance memory speed and capacity with cost.
A predominant manner of obtaining such a balance is to use multiple “levels” of memories in a memory architecture to attempt to decrease costs with minimal impact on system performance. Often, a computer relies on a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main memory that uses dynamic random access memory devices (DRAM's) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with static random access memory devices (SRAM's) or the like. In some instances, instructions and data are stored in separate instruction and data cache memories to permit instructions and data to be accessed in parallel. One or more memory controllers are then used to swap the information from segments of memory addresses, often known as “cache lines”, between the various memory levels to attempt to maximize the frequency that requested memory addresses are stored in the fastest cache memory accessible by the microprocessor. Whenever a memory request attempts to access a memory address that is not cached in a cache memory, a “cache miss” occurs. As a result of a cache miss, the cache line for a memory address typically must be retrieved from a relatively slow, lower level memory, often with a significant performance hit.
In many multi-level memory architectures, a memory request is not forwarded to lower levels of memory until it is determined that a cache miss has occurred in a higher level cache memory. As a result, a delay is often introduced during this determination, an operation that is often referred to as a cache lookup, or snoop, operation. Other architectures attempt to eliminate this delay by speculatively issuing some memory requests to a lower level of memory concurrently with performing the cache lookup operation. In some instances, performance is increased for cache misses, as the lower level memory is able to begin processing the memory request prior to completion of the cache lookup operation in the higher level of memory. In other instances, however, performance can be decreased due to the fact that the lower level memory is required to process additional memory requests, which increases the workload of the lower level memory and decreases the available bandwidth of the memory buses that communicate the requests between components in the memory architecture. Given the fact that the additional memory requests are typically those memory requests that result in a cache hit on the upper level of memory, the results of processing the memory requests in the lower level memory are often never used, thus occupying system resources that could otherwise be used for more productive activities.
In addition, even in the event that speculatively issued memory requests are eventually used, some latency is still typically associated with the issuance of memory requests in a number of conventional memory architectures. Many architectures, for example, are pipelined such that requests are handled in a First-In-First-Out (FIFO), i.e., where the requests are communicated to a lower level memory in the order they were received. In some instances, however, memory requests that are directed to more performance-critical data are stalled waiting for less critical, but earlier issued, memory requests to be communicated to the lower level memory, thus reducing throughput in performance-critical areas.
As an example, in many architectures, memory write requests are often relatively low priority operations since the write requests are predominantly issued to update a copy of a cache line in a lower level memory after the cache line is no longer being used in the upper level memory. As a result, these memory write requests are often not as performance-critical as other types of requests, in particular read or load requests.
Therefore, a need continues to exist in the art for reducing the latency associated with handling memory requests in a multi-level memory architecture.