The continuing development of computer systems has resulted in efforts to increase performance and maximize efficiency of the computer systems. One solution to this problem has been the creation and utilization of cache systems in a computer. The purpose of a cache system is to bring the speed of accessing computer system memory as close as possible to the speed of the central processing unit (CPU) itself. By making instructions and data available to the CPU at a rapid rate, it is possible to increase the performance rate of the processor. A cache system has access time that approaches that of CPU components, and is often 5 to 10 times faster than the access time of main memory components. When the CPU makes a data request, the data can be found in one of the processor caches, main memory, or in a physical storage system (such as a hard disk). Each level consists of progressively slower components. There are usually several levels of cache. The L1 cache, which usually exists on the CPU, is the smallest in size. The larger L2 cache (second-level cache) may also be on the CPU or be implemented off the CPU with SRAM. main memory is much larger and consists of DRAM, and the physical storage system is much larger again but is also much, much slower than the other storage areas. Cache memories are fast memory storage devices. A cache system increases the performance of a computer system by predicting what data will be requested next and having that data already stored in the cache, thus speeding execution. The data search begins in the L1 cache, then moves out to the L2 cache, then to DRAM, and then to physical storage.
A process known as xe2x80x9cprefetchingxe2x80x9d is known in the art. Prefetching is used to supply memory data to CPU caches ahead of time to reduce microprocessor access time. By fetching data from a slower storage system and placing it in a faster access location, such as the L1 or L2 cache, the data can be retrieved more quickly. Ideally, a system would prefetch the data and instructions that will be needed next far enough in advance that a copy of the data that will be needed by the CPU would always be in the L1 cache when the CPU needed it. However, prefetching involves a speculative retrieval of data that is anticipated to be needed by the microprocessor in subsequent cycles. Data prefetch mechanisms can be software controlled by means of software instructions, or hardware controlled, using pattern detection hardware. Each of these prefetch mechanisms has certain limitations.
Software prefetch mechanisms typically use instructions such as Data Stream Touch (DST) to prefetch a block of data. Once the prefetch is started by the software command, hardware is used to prefetch the entire block of data into the cache. If the block of data fetched is large relative to the size of the L1 cache, it is probable that data currently being used by the CPU will be displaced from the L1 cache. The needed displaced lines will have to be refetched by the CPU, resulting in a slower performance. In addition, software prefetch instructions may be used to generate access patterns which do not efficiently use caches when prefetching larger lines, such as 128 bytes. For example, a DST instruction can specify a starting address, a block size (1 to 32 vectors, where a vector is 16 bytes), a number of blocks to prefetch (1 to 256 blocks), and a signed stride in bytes (xe2x88x9232768 to +32768). An access pattern which specifies blocks which span cache lines and are irregularly spaced, relative to the cache lines, will waste cache space. And, due to the sparse use of the data in the cache line, performance will be lowered. Additionally, large amounts of hardware may required to implement the full scope of the software prefetch instruction.
Hardware mechanisms prefetch a stream of data and generally only prefetch as far ahead as the cache and memories require. Because hardware mechanisms detect a stream, the stream logic has to generate enough prefetched to get the designated number of lines ahead of the actual processor accesses. Once the hardware is far enough ahead, the lines are prefetched at the rate at which the processor consumes them. Often, however, especially when a hardware prefetch is first started, several prefetches may be active at once in order to get enough lines ahead of the actual processor accesses. Prefetching of several streams at once can slow processor speed in general, which can slow access to needed data and processing of that data. These problems are increased in a systems that prefetches data from a plurality of L1 and L2 caches, as is becoming more common in larger, faster systems having multiple processors.
With either software or hardware prefetch mechanisms, there is always a performance trade off between utilizing cache resources for the prefetches versus CPU intruction or data requests. Therefore, what is needed is a system and method of efficiently utilizing prefetch logic so as to maximize CPU performance.
The present invention, accordingly, provides a method and apparatus for controlling utilization of resources in a system for prefetching. By controlling the amount of L2 resources which can be used by prefetch accesses, the amount of CPU resources being used for prefetch requests and responses is kept at a level that does not exceed cache processing and storage abilities.
A method of the present invention involves prefetching data in a data processing system comprising a plurality of L1 caches and a plurality of L2 caches to control utilization of resources. The method comprises defining a maximum number of allowable L2 cache prefetches, and monitoring the actual number of L2 cache prefetches. When the system receives a request for an L2 cache prefetch, it determines if the current actual number of L2 cache prefetches is less than the defined maximum number of allowable L2 cache prefetches. If the actual number of L2 cache prefetches is less than the maximum number of allowable L2 cache prefetches, the system permits prefetching the requested data to the L2 cache. If the actual number of L2 cache prefetches is equal to the defined maximum number of allowable L2 cache prefetches, the system delays prefetching the requested data to the L2 cache until at least one prefetch already in the cache has been completed.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.