Many state-of-the-art microprocessors employ general principles of superscalar design, such as described in the book "SuperScalar Microprocessor Design" by Mike Johnson, Prentice-Hall, Inc., 1991. Often, these modern microprocessors utilize techniques of "prefetching" to expedite the fetching of instructions from memory. With prefetching, instructions are typically accessed from main memory, or perhaps from an instruction cache, before the microprocessor has determined which instructions actually need to be fetched. With a "static" prefetch, the microprocessor prefetches the next one or two instructions which are stored sequentially in memory following the last verified instruction. If the prefetched instruction is subsequently verified as a correct instruction, then processing speed is improved because the instruction, once verified, is immediately available for execution. On the other hand, if the prefetched instruction is subsequently determined to be incorrect, perhaps because an instruction branch occurred, it is squashed and the correct instruction is then fetched. Since instructions typically are stored in memory in sequential order, a static prefetch mechanism which prefetches the next sequentially ordered instruction is often successful in improving processor efficiency. With some state-of-the-art microprocessors, a static prefetch mechanism can fetch five or more instructions ahead of the last verified instruction.
A problem can arise with static prefetching if memory mapped input/output (I/O), or other storage locations subject to memory accessing side effects, are located in the vicinity of instructions to be fetched. With static prefetching, it is possible to attempt a prefetch from a memory mapped I/O location even though the location was not intended to store an instruction. This can occur, for example, when an actual instruction stream branches over a portion of memory containing memory mapped I/O. Therefore a static instruction prefetch which fetches several instructions ahead of the last verified instruction might fetch into the memory mapped I/O region of memory before the branch is detected.
This problem is illustrated in FIG. 1 which shows three pages of memory: pages 1, 2, and 3. Pages 1 and 3 store instructions, while page 2 stores memory mapped I/O. The instruction stored near the end of page 1 indicates a branch to instruction 5 in page 3. Arrow 6 illustrates the branch into page 3. The problem that can arise, however, is that the static prefetch (shown by arrow 7) might fetch from memory mapped I/O locations in page 2 before the branch 4 is detected. In such a case the memory mapped I/O locations may be corrupted.
To avoid such problems, computer designers have used several different strategies. According to a first scheme, prefetch is constrained to proceed no more than a fixed number of instructions ahead of the last executed instruction. For example, this design solution may be implemented by requiring at least 128 bytes between the branch to A and the first memory mapped I/O location, as shown by dashed line 8 in FIG. 1. The drawback of this approach, however, restricts performance since the number of instructions that may be prefetched is reduced.
A second scheme for avoiding memory mapped I/O side effect problems involves constraining prefetch to not cross "page" boundaries. In other words, code and memory mapped I/O is constrained so as not to coexist within the same page. Unfortunately, although many computer architectures (e.g., the Intel x86 architecture) implement 4K byte pages for memory translation, code plus memory mapped I/O has already been intertwined at a finer (e.g., 1K) granularity. Furthermore, restricting prefetch across such boundaries reduces performance in the common case where there are no memory mapped I/O side effects to be considered.
Additional prefetching problems arise within microprocessors capable of performing branch predictions. With branch prediction, a microprocessor attempts to predict which direction of a branch to execute before the controlling condition, e.g., of an IF statement, is detected. The prediction is made before the branch condition is resolved. If pre-fetching is based upon a predicted branch which is subsequently determined to be incorrect, then memory mapped I/O may be accidentally pre-fetched before the branch misprediction can be detected.
This problem is illustrated graphically in FIG. 2. More specifically, FIG. 2 illustrates three pages of memory, generally denoted 11, 12 and 13, respectively. Pages 11 and 13 store instructions whereas page 12 stores memory mapped I/O. Instruction 14 stored within page 11 indicates a branch either to instruction 15 within the memory mapped I/O page 12 or to instruction 16 within page 13. The correct branch to be taken is the branch to memory instruction 16 (identified by a solid arrow 17.) However, as a result of an incorrect branch prediction, the microprocessor may predict that a branch to memory instruction 15 within the memory mapped I/O page is to be taken. The incorrect branch is denoted by a dashed line 18. As can be appreciated, a pre-fetched base upon the mispredicted branch will result in a prefetch from memory mapped I/O which may have undesirable side effects resulting in corruption of data or other non-recoverable errors.
At this point one might wonder why a processor would ever predict a branch to a memory-mapped I/O location. For instance, if the processor is guaranteed to only predict branch targets which are executable code, illegal prefetch from memory mapped I/O could not occur. Practitioners in the art should understood, however, that there are numerous situations in which it is convenient to allow a prediction to be made to a location which is not necessarily legal code. By way of example, one such situation arises where a location that previously stored code is now memory mapped I/O. This could occur, for example, as a result of bank switching or virtual address remapping. If a prediction could not be made to this location, branch target buffer (BTB) invalidation might be required whenever a bank switch or virtual address remapping occurs.
Because there are good reasons for allowing a BTB to make dynamic predictions that may possibly be incorrect, it becomes imperative to provide some mechanism which prevents prefetch from memory mapped I/O to avoid the undesirable side effects described above. Accordingly, there is an unfulfilled need for an improved method of prefetching instructions; particularly one which permits some prefetching based on branch predictions, and does not impose significant programming restrictions on the computer system.