1. Field of the Invention
This invention relates to computer systems and more particularly to code prefetching mechanisms and techniques employed within computer systems. The invention also relates to memory control techniques for computer systems.
2. Description of the Relevant Art
A variety of techniques have been developed to increase the overall processing speed of computer systems. While improvements in integrated circuit processing technologies such as submicron processing capabilities have made it possible to dramatically increase the speed of the integrated circuitry itself, other developments in the architectures and bus transfer mechanisms of computer systems have also led to improvements in performance. Exemplary developments include the incorporation of cache memory subsystems as well as code prefetching mechanisms within computer systems.
A cache memory is a high-speed memory unit interposed in the memory hierarchy of a computer system between a slower system memory and a microprocessor to improve effective memory transfer rates and, accordingly, improve system performance. The name refers to the fact that the small cache memory unit is essentially hidden and appears transparent to the user, who is aware only of a larger system memory. The cache is usually implemented by semiconductor memory devices having speeds that are comparable with the speed of the processor, while the system memory utilizes a less costly, lower-speed technology. The cache concept anticipates the likely reuse by the microprocessor of selected data and code in system memory by storing a copy of the selected data in the cache memory.
A cache memory typically includes a plurality of memory sections, wherein each memory section stores a block or a "line" of two or more words. (As used herein, a "word" refers to any predefined number of bits.) Each line has associated with it an address tag that uniquely identifies which line of system memory it is a copy of. When a read request originates in the processor for a new word, whether it be data or instruction, an address tag comparison is made to determine whether a copy of the requested word resides in a line of the cache memory. If present, the data is used directly from the cache. This event is referred to as a cache read "hit". If not present, a line containing the requested word is retrieved from system memory and is stored in the cache memory. The requested word is simultaneously supplied to the processor. This event is referred to as a cache read "miss", and results in a line-fill into the cache.
When the processor desires to write data to memory, a similar address tag comparison is made to determine whether the line into which data is to be written resides in the cache memory. If not present, the data may be written directly into the system memory and/or the corresponding line may be fetched into the cache memory from the system memory to allow the data to be written into that line of the cache memory. This event is referred to as a cache write "miss". If the line is present, the data is written directly into the cache memory. This is referred to as a cache write "hit." In many systems, a dirty bit for the cache line is then set. The dirty bit indicates that data stored within the line is dirty (i.e., has been modified and is inconsistent with system memory), and thus, before the line is deleted from the cache memory or overwritten, the modified data must be written back to system memory.
As stated previously, another feature which has led to improvements in the performance of computer systems is code prefetching. Code prefetch techniques involve the transfer of instruction code from the system memory (or the cache memory) into a temporary buffer, even though the processor has not yet actually requested the prefetched instruction code. Typically, code is prefetched by reading a line of code within the program memory which immediately follows the line containing the currently-executing instruction. Thus, if the code is executing in a sequential manner, as is the typical situation, the prefetched code may be provided directly from the temporary buffer when the processor requests it, rather than having to read the code from system memory. As a result, the overall speed of the computer system may be enhanced.
When prefetching is used to obtain execution code, the system memory is typically controlled to provide the entire line containing the prefetched code at a high data transfer rate. This is typically accomplished by burst memory access. As is well-known, during the data phase of a burst memory access cycle, a new word is provided to the system bus from system memory for each of several successive clock cycles without intervening address phases. The fastest burst cycle (no wait state) requires two clock cycles for the first word (one clock for the address, one clock for the corresponding word), with subsequent words returned from sequential addresses on every subsequent clock cycle. For systems based on the particularly popular model 80486 microprocessor, a total of four "doublewords" are typically transferred during a given burst cycle.
Referring to FIG. 1, a block diagram of an exemplary computer system 10 including a cache memory 12 and a sequential code prefetcher 14 is shown. The cache memory 12 and sequential code prefetcher 14 are coupled to a CPU core 16 and to a bus interface unit 18. A system memory 20 is further coupled to bus interface unit 18 via system bus 22.
Sequential code prefetcher 14 is provided to prefetch a sequential line of code within a program segment of system memory 20. That is, when CPU core 16 is executing a particular instruction, sequential code prefetcher 14 fetches a line of code which is sequential to that containing the currently executing instruction. Since it is probable that the next instruction requested by the CPU core 16 will be sequential with respect to the previously executed instruction, the sequential code prefetcher 14 advantageously allows the sequential instructions to be provided to CPU core 16 much more quickly in comparison to a case in which the CPU core must fetch the code from system memory 20 when needed. The sequential code prefetcher 14 as illustrated in FIG. 1 may include a sequential line adder for determining the address value of the sequential line to be prefetched.
Although the prefetching of sequential code as described above has been quite successful in improving overall performance of computer systems, such sequential code prefetching is ineffective when a branch or a jump instruction is encountered by the microprocessor since the next line of code to be executed when a branch or a jump is taken will not be in address sequence with respect to the address of the branch or jump instruction itself. As a result, when a jump or a branch is executed, the processor may "stall" since it may need to wait for the non-sequential code corresponding to the branch target address to be fetched from system memory. For software programs in which a large number of jump and/or branch instructions are employed, substantial impacts upon performance are often evident as a result of the stalling of the processor.
In an attempt to alleviate the problems of sequential code prefetching when branch or jump instructions are encountered, branch prediction logic has been proposed for incorporation within computer systems. Within such branch prediction logic, when a branch or a jump instruction is decoded and recognized by the processor, the prefetch unit is controlled to prefetch a line of code based upon a history of whether the particular branch or jump instruction had been taken previously and, if so, based upon the target address previously taken. Unfortunately, this branch prediction logic is typically associated with several problems. Firstly, the branch prediction logic is usually quite complex, thus increasing the cost and degrading the overall reliability of the integrated circuit. Furthermore, the prefetch unit must decode a particular instruction to determine whether or not it is a branch or jump instruction before the prefetch of the target address code can be initiated.
A prefetch mechanism and technique are accordingly desirable for a computer system wherein code may be prefetched without solicitation by the execution unit and wherein non-sequential code may be prefetched which corresponds to a target address of a jump or branch instruction. A prefetch mechanism and technique are further desirable wherein the prefetch of the target address instruction of a branch or a jump instruction may be initiated without first requiring that the branch or the jump instruction itself be decoded. A prefetch mechanism and method are finally desirable wherein relatively simple circuitry may be employed for their implementation.