In order to create a large computing system capable of running many tasks concurrently, it is usually necessary to provide multiple copies of data used by the tasks so that there is often one physically close to the processor running the task. These copies are stored in caches which can be constructed in a variety of sizes and organizations. When a particular process needs a data item, one or more of these caches are searched to see if they contain the desired data and if they do not, then the request will be passed to a memory controller which manages a much larger memory space known as main memory. The goal of maintaining multiple copies of data is to reduce the average amount of time is takes for a particular processor to access the data item that it needs. Searching the caches takes a certain amount of time and if the desired data is not located, that time is added to the total access time required to retrieve the data from the main memory. Thus is can be beneficial to start the access to main memory before it is known whether or not the desired data item is in one of the caches. This is known as a speculative read because if the data is found in a cache, that data will be used and the data retrieved from main memory will be discarded. The other case in which all caches that might possibly contain the desired data are searched before the access to main memory is started is known as a demand read. The drawback to speculative reads is that they consume memory and bus resources which are then not available for data requests other processes.
Accordingly, there is a need in the art for proper weighting between demand and speculative reads to minimize read latency and maximize performance of a memory subsystem.