1. Field of the Invention
This invention relates generally to the field of data processing systems, and, more particularly, to methods and apparatus for processing ordered data requests to a memory.
2. Description of the Related Art
The demand for quicker and more powerful personal computers has led to many technological advances in the computer industry including the development of faster memories. Historically, the performance of a personal computer has been linked to the speed of accesses to the memory to retrieve data manipulated by instructions and to retrieve data coding for the instructions themselves. The performance of high speed processors was hindered by slow data access times. To expedite data accesses, a fast memory known as xe2x80x9ccache memoryxe2x80x9d was developed.
A cache memory is relatively small and operates at higher speed than a main memory due to either a more direct coupling to the processor or hardware adaptations. The cache memory stores the most recently utilized data blocks such that accessing these blocks is faster than accessing the main memory.
The use of cache memories ordinarily enables the processor to reduce the number of wait periods associated with retrieving data from memory. When the data requester issues a request for data, the cache memory determines whether the data is present in the cache memory. When the data is present in the cache memory, a situation referred to as a cache memory xe2x80x9chitxe2x80x9d occurs, and the data is forwarded to the data requester with a relatively small wait. When the data is not present in the cache memory, a situation referred to as a cache memory xe2x80x9cmissxe2x80x9d occurs, and the cache memory performs several operations. First, the cache memory retrieves the requested data from a secondary memory. Then, the cache memory sends the requested data to the data requester and stores the retrieved data in the cache memory itself. The secondary memory may be a main memory or another cache memory, i.e., a multi-level cache memory. The retrieval of data from the secondary memory is often a much slower operation.
Most cache memories have two subsystems, a xe2x80x9ccache tag arrayxe2x80x9d and a xe2x80x9ccache data array.xe2x80x9d The cache tag array stores entries for secondary memory addresses associated with data array entries. The addresses are used to determine whether a data request will result in a cache memory hit. The cache data array stores and delivers data in response to data requests. In multi-level cache memories each cache data array has a corresponding tag array.
Pipelines have further improved the performance of processors by performing processing in parallel and in stages. As opposed to serial processing where all the stages complete the processing of one instruction before beginning the processing of the next instruction, a pipelined device overlaps the stages by processing different instructions at the same time. The effective processing speed of each instruction remains unchanged, but the throughput for instruction processing is increased, because several instructions may be processed by different individual pipeline stages in parallel. Since data requests are repeatedly made to memories, pipelined data-request ports can speed up the processing of data requests.
FIG. 1A is a timing diagram for two serial data requests to a cache memory having a pipelined data-request port. The pipeline has four stages, i.e., a latency of four, and one data request can start at each clock cycle, i.e., a bandwidth of one per clock. The first and second requests are received at t=0 and at t=1, respectively. In the illustrated pipelined data-request port, the hit or miss status of a data request becomes known in the third stage. Thus, there is a lag of three clock cycles between the time at which the port starts to process a data request and the time at which it is known that the request can be completed without a slow data retrieval from a secondary memory.
Data requests can be either xe2x80x9corderedxe2x80x9d or xe2x80x9cunordered.xe2x80x9d Ordering dictates the sequential order in which mutually ordered requests should be completed by the hardware. One example of an ordering relation imposes that an earlier issued request, e.g., the first request of FIG. 1A, be completed before a later issued request, e.g., the second request of FIG. 1A. Other ordering relations exist, e.g., simultaneously issued data requests may be ordered with the order of a program. In the following, xe2x80x9cearlierxe2x80x9d ordered operations are defined to be operations that should complete before xe2x80x9claterxe2x80x9d ordered operations. xe2x80x9cEarlierxe2x80x9d and xe2x80x9claterxe2x80x9d are not limited to program ordering. If two requests are xe2x80x9cunordered,xe2x80x9d hardware may complete the two requests in any order. The ordering of data requests can slow processing of data requests by a pipelined cache memory.
FIGS. 1B is a timing diagram that illustrates why processing ordered data requests may be problematic. The first and second data requests are respective earlier and later ordered requests to the cache memory of FIG. 1A, which are received at t=0 and t=1, respectively. In FIG. 1B, the first request results in a cache memory miss. The first request completes in more than four cycles, because the requested data must be retrieved from a slow secondary memory in a cache memory miss. On the other hand, the second request completes in four cycles, because the second data request results in a cache memory hit. Thus, serially issuing ordered data requests can result in retrievals that violate ordering relations in situations of cache memory misses.
FIG. 1C shows one method for avoiding data retrievals that violate the ordering relationship. Issuance of the second or later ordered data request is delayed until t=3, i.e., until after the hit/miss status of the first request is known. In the illustrated pipelined cache memory, the second request waits three clock cycles until the hit/miss status of the first request is determined. The need to wait for the status of earlier requests reduces the speed for processing ordered data requests, i.e., increases the latency, and lessens the advantages of pipelining.
Multi-porting may further increase the speed of a memory by enabling the processing of several data requests during each clock cycle. FIG. 2A is a timing diagram for a doubled pipelined data-request port of four stages in a cache memory. Two data requests can be received in each clock cycle. Thus, the doubled data-request port may double the throughput for data requests, i.e., the bandwidth is two per clock. Cache memory hits and misses are known at the third stage, i.e., a lag of three clock pulses, a lag that can lead to problems with processing ordered data requests.
FIG. 2B is a timing diagram illustrating one problem with processing ordered requests in the doubled pipelined data-request port of FIG. 2A. The first and second data requests are serially ordered, i.e., the first data request is the earlier request. At t=2, it is determined that the first data request will register a cache memory miss. The second data request registers a cache memory hit and can complete at t=3, i.e., before the earlier first request, because the first data request needs a slow data retrieval from secondary memory to complete. Ordered data requests cannot be processed by such a method in a multi-ported memory, because a later ordered instruction may complete before an earlier request due to a cache memory miss.
The time line of FIG. 2C illustrates a method of processing ordered requests in the doubled pipelined data-request port of FIG. 2A. In response to a cache memory miss for the earlier ordered data request, i.e., the first data request, all pending requests in the pipeline are flushed. The flushing eliminates ordering violations. But, the flushing also reduces the speed of the memory and the advantages of pipelining, because some of the flushed requests may not be ordered.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
In one aspect of the present invention, a method is provided for requesting data from a memory. The method includes issuing a plurality of data requests to a data request port for the memory. The plurality of data requests includes at least two ordered data requests. The method includes determining if an earlier one of the ordered data requests corresponds to a miss in the memory, and converting a later one of the ordered data requests to a prefetch in response to the earlier one of the ordered data requests corresponding to a miss in the memory.
In another aspect of the present invention, an apparatus is provided. The apparatus includes a memory having at least one pipelined port for receiving data requests. The port is adapted to determine whether an earlier ordered one of the data requests corresponds to a miss in the memory. The port converts a later ordered one of the data requests to a prefetch in response to determining that the earlier ordered one of the data requests corresponds to a miss in the memory.