A known way to increase the performance of a computer system is to include a local, high speed memory known as a cache. A cache increases system performance because there is a high probability that once the central processing unit (CPU) has accessed a data element at a particular address, its next access will be to an adjacent address. The cache fetches and stores data which is located adjacent to the requested piece of data from a slower, main memory or lower-level cache. In very high performance computer systems, several caches may be placed in a hierarchy. The cache which is closest to the CPU, known as the upper-level or "L1" cache, is the highest level cache in the hierarchy and is generally the fastest. Other, generally slower caches are then placed in descending order in the hierarchy starting with the "L2" cache, etc., until the lowest level cache which is connected to main memory. Note that typically the L1 cache is located on the same integrated circuit as the CPU and the L2 cache is located off-chip. However as time passes it is reasonable to expect that lower-level caches will eventually be combined with the CPU on the same chip.
Recently, microprocessors designed for desktop applications such as personal computers (PCs) have been modified to increase processing efficiency for multimedia applications. For example, a video program may be stored in a compression format known as the Motion Picture Experts Group MPEG-2 format. When processing the MPEG-2 data, the microprocessor must create frames of decompressed data quickly enough for display on the PC screen in real time. However, the latency in fetching data for the L2 cache may be as many as 100 to 150 processor clock cycles.
Even with aggressive out-of-order processor microarchitectures, it is difficult for the processor to make forward progress and program execution when waiting for data from long latency memories when cache miss rates are significant.
To help hide this long main memory latency many instruction set architectures have added instructions which serve only to prefetch data from memory into the processor's cache hierarchy. If software can predict far enough in advance the memory locations which the program will subsequently use, these instructions can be used to effectively hide the cache miss latency. This can be done because the software directed prefetch mechanism only uses resources which serve cache misses and do not tie up other valuable resources such as completion buffer entries and register renames.
One way of providing software prefetching has been classified as synchronous software directed prefetching. The prefetching is synchronous because the prefetch hint usually specifies a small amount of memory and can be executed in program order like any other load instruction. In architectures such as the PowerPC architecture, available from Motorola, Inc. of Austin Tex., instructions called data cache block touch and data cache block touch for store are examples of synchronous software prefetch instructions.
Another instruction class of prefetch instructions is called data stream touch (DST). DST instructions are classified as asynchronous because the instructions can specify a very large amount of memory to be prefetched in increments of cache blocks by a DST controller. The DST controller runs independently of normal load and store instructions. That is, the controller runs in the background while the processor continues normally with the execution of other instructions. DST instructions are useful where memory accesses are predictable and can be used to speed up many applications, such as for example, multimedia applications.
However, the DST mechanism still requires resources for processing cache misses. These resources are also used for normal load store operations. Examples of these missed resources are cache reload queue entries or miss queue entries. If the DST engine saturates the miss resources such as the cache miss queue, the forward progress of normal loads and stores will be stopped because of full buffer conditions.