The invention relates generally to the prefetching of data for access by a processor. More particularly, the invention relates to predicting the next data fetch so that the predicted data can be prefetched to the lowest level cache before the predicted data is requested by the processor.
Processor instruction execution speeds are much faster than the time required to access instructions from a computer""s main memory. The slower main memory access time can create performance bottlenecks when a processor is forced to wait for fetched instructions to be transferred from the main memory to the processor. To minimize the gap between processor speed and main memory access time, higher speed cache memory is used to temporarily buffer instructions such that cached instructions are supplied to the processor with minimal time delay. FIG. 1 is a depiction of a typical processor and memory arrangement that utilizes multilevel cache memory to supply a processor. In FIG. 1, the processor 10 is connected to a level zero (L0) cache 14, a level one (L1) cache 16, and a main memory 18 by a bus 22. Other configurations are possible and may have, for example, the L0 cache located on the same chip as the processor and connected to the processor by on-chip circuitry or the cache levels may be directly connected to the processor. The processor can be any processor, often referred to as a microprocessor or a central processing unit, that processes computer code such as assembly language code. The cache memory is often high speed memory, such as static random access memory (SRAM), and the main memory can be, for example, dynamic random access memory (DRAM), and/or flash memory. The cache memory is typically more expensive to build than the main memory and therefore, the cache memory is usually sized to store only a small portion of the main memory storage capacity.
In typical computer systems, assembly language instructions are delivered to the processor from memory and then executed by the processor. Referring to FIG. 2, a typical assembly language instruction 26 includes an opcode portion 28 and an operand portion 30. The opcode, for operation code, informs the processor of what operation is to be performed. Opcode instructions include, for example, load instructions, add instructions, and subtract instructions. Referring to FIG. 3, a typical instruction 32 includes an opcode 38 that is referenced by a program counter (PC) 36. The program counter is an instruction location indicator that identifies the address within the memory of the desired instruction and the instruction directs the performance of functions, such as loading data, adding data, or subtracting data. The operand includes the symbolic name for the memory address of the data that is to be operated on by the instruction or in some cases the memory addresses of another instruction. Referring to FIG. 3, the operand may include information on the source address 40 or addresses and the destination address 42 or addresses, where the source address is the location of the data that is to be operated on by the instruction, and where the destination address is the target location for data that is the result of the current operations. The source and destination addresses may include addressing modes, where the addressing modes are algorithms that determine the proper source or destination address for data stored within the memory. Data addressing modes can be categorized as random access addressing modes or deterministic addressing modes. Random access addressing modes include absolute addressing, register indirect addressing, and base plus offset addressing. Deterministic addressing modes include sequential addressing modes such as register indirect addressing with pre/post incrementing, circular addressing, and/or bit reverse addressing.
Referring back to FIG. 1, since cache memory cannot store the total volume of information that is stored in the main memory 18, all of the information required by the processor 10 cannot be stored in the L0 cache 14 at the same time and cache misses will result when the processor fetches data that is not stored in the L0 cache. In order to increase the L0 cache hit ratio, instructions and/or data can be prefetched from the main memory to the L0 cache 14 or L1 cache 16 in anticipation of a data fetch by the processor. Prefetching of instructions to the cache is made easier by the sequential nature of computer program instruction execution. That is, computer programs often run routines that utilize program instructions in sequential order and as a result, a string of instructions can be prefetched from the main memory to the cache with some degree of confidence that the instructions will soon be needed by the processor. Branch target buffers can be used for prefetching instructions that do not exhibit sequential characteristics.
In contrast to prefetching instructions, data is often accessed in more of a random nature, such that prefetching is more difficult to perform. One common technique used in prefetching data is that when a cache miss occurs, the current cache line is filled from the main memory with the desired prefetch data and a next cache line is filled with a block of data from the main memory that is spatially close to the missed data. Although the block caching approach may work well for some applications, it has disadvantages. Specifically, the block of supplemental data is prefetched from the main memory without any knowledge of the data access pattern of the current program and as a consequence, if the currently accessed data element is not part of a sequential data structure, the data prefetch may be filling the cache with unneeded data in the place of data that may soon be needed by the processor.
In addition to block data prefetching, other techniques for data prefetching involve recognizing access patterns that have developed from previous data accesses and then extrapolating the recognized pattern to generate new prefetch addresses. For example, a pattern recognition technique is disclosed in U.S. Pat. No. 5,694,568, entitled xe2x80x9cPrefetch System Applicable to Complex Memory Access Schemes,xe2x80x9d issued to Harrison, III et al. Although this technique may work well for its intended purpose, the technique relies on recognizing access patterns based on past data accesses where the past patterns may inaccurately predict future data access patterns.
In view of the shortcomings of the known prior art, what is needed is a method and apparatus for prefetching data that provide a high cache hit ratio.
A method and apparatus for prefetching data to a low level memory of a computer system utilize an instruction location indicator related to an upcoming instruction to identify a next data prefetch indicator and then utilize the next data prefetch indicator to locate the corresponding prefetch data within the main memory of the computer system. The prefetch data is located so that the prefetch data can be transferred to the low level memory, where the data can be quickly accessed by a processor before the upcoming instruction is executed. The next data prefetch indicator is generated by carrying out the addressing mode function that is embedded in an instruction only when the addressing mode of the instruction is a deterministic addressing mode such as a sequential addressing mode. The next data prefetch indicator is identified by the instruction location indicator by relating corresponding next data prefetch indicators to instruction location indicators in a searchable table.
In the preferred embodiment, a data prefetch prediction table is generated that enables the next data prefetch indicator to be identified based on the program counter of an instruction that is soon to be executed. Entries in the data prefetch prediction table are formed from instructions that utilize deterministic addressing modes for identifying the effective address of the source data. The data prefetch prediction table entries include a program counter tag and a next data prefetch indicator. The program counter tag is the program counter related to the present instruction and the program counter tag allows the data prefetch prediction table to be searched by the program counter that is related to a particular instruction. The next data prefetch indicator is the effective address of the data that is likely to be required the next time the same instruction is executed. The next data prefetch indicator is calculated by carrying out the addressing mode function that is associated with the instruction. Since the addressing mode function is a deterministic function, there is a high likelihood that the calculated next effective address will be the actual effective address that is fetched the next time the instruction with the same program counter value is executed.
The elements of a computer system in accordance with a preferred embodiment of the invention include a processor, a level zero cache, a level one cache, a main memory, and a data prefetch engine. In the preferred embodiment, the processor is any conventional processor having a program counter that identifies the address of instructions that are to be executed. The level zero cache is preferably SRAM that provides the fastest data transfer rate to the processor and that is located physically close to the processor. The level one cache is preferably SRAM that provides a slower data transfer rate to the processor and that is located on a system mother-board connected to the processor by a system bus. The main memory is a large capacity memory that provides a relatively slow transfer of data to the processor. The main memory may include DRAM, flash memory or other suitable memory types. The main memory is connected to the processor by a system bus.
The data prefetch engine is preferably integrated with the processor and manages the prefetching of data between the level zero cache, the level one cache, and the main memory. The data prefetch engine utilizes a next data prefetch controller, a data prefetch predictor, and a refill manager to predict the effective address of the next desired data memory reference and to transfer the data corresponding to the predicted data memory reference to the lowest level cache in order to create the best chance for a cache hit upon the execution of a particular instruction.
The next data prefetch controller screens out instructions having non-deterministic addressing modes and uses instructions having deterministic addressing modes such as sequential addressing modes to build a data prefetch prediction table that is used to predict the next prefetch. Generation of a data prefetch prediction table entry involves calculating the next effective address related to the present instruction by carrying out the addressing mode function related to the present instruction. The data prefetch predictor utilizes the data prefetch prediction table formed by the next data prefetch controller to rapidly identify the next effective address for a data prefetch related to an upcoming instruction. The data prefetch predictor maintains the data prefetch prediction table in a content-addressable memory that can be quickly searched by program counter tag. A refill manager of the data prefetch engine is responsible for transferring prefetch data that is not found in the lowest level cache to the lowest level cache when a prefetch miss occurs at the lowest level cache. The refill manager generates prefetch requests for higher level memory until the desired prefetch data is located and transferred to the lowest level cache.
In operation, the next data prefetch controller generates the data prefetch prediction table utilizing executed instructions that exhibit deterministic addressing modes. The data prefetch prediction table is located in the data prefetch predictor and is constantly updated by the next data prefetch controller. When a new instruction is identified as an instruction that will soon be executed, the program counter related to the instruction is forwarded to the data prefetch predictor. The program counter related to the instruction is used by the data prefetch predictor to search the program counter tag column of the data prefetch prediction table for a matching program counter tag. If a matching program counter tag is identified, the next data prefetch indicator is extracted from the table entry and the indicator is used to search the lowest level cache in the computer system for a cache line that matches the effective address of the next data prefetch indicator. If a cache hit occurs in the lowest level cache, no further prefetching is required. On the other hand, if a cache miss occurs in the lowest level cache, then the refill manager generates a prefetch request utilizing the next data prefetch indicator that enables the higher level memories to be searched for data with the corresponding effective address. Once the data with the corresponding next effective address is located, the refill manager transfers the located data to the lowest level cache in the computer system. With the target prefetch data transferred to the lowest level cache, the prefetch process related to the current instruction is complete. When the current instruction is finally executed by the processor, there is a higher probability that the data requested by the current instruction will be located in the lowest level cache, thereby allowing the fastest data access time possible.