1. Field of the Invention
The present invention generally relates to processor architectures, and more particularly, the present invention relates to a method for ensuring the consistency of an instruction cache of a microprocessor.
2. Description of the Related Art
As shown in FIG. 1, many current RISC processor configurations are implemented by an instruction cache (i-cache) 102 interposed between a main memory 104 and an execution unit 108. The i-cache 102 receives instructions from the main memory 104 and temporarily stores the instructions which are to be executed by the execution unit 108. Typically, the i-cache 102 is a primary level cache in the sense that a slower secondary level external cache (e-cache) 106 is located between the main memory 104 and the i-cache 102. Both writes (stores) to and reads (fetches) from the main memory 104 must occur through the e-cache 106. Additionally, prefetching schemes are employed whereby the next instructions are fetched from the i-cache 102 into an instruction queue 110 before the current instruction is complete.
Instruction "consistency" of the data contained in the i-cache is a concept related to the case where a prior store instruction is to an address of a subsequent instruction already loaded into the i-cache. Suppose, for example, the i-cache 102 is configured to hold 256 lines (instructions) of data at a time. Suppose further that instructions 60 and 130 of the program sequence may be characterized as follows:
______________________________________ . . 60. Store instruction X at address A . . . 130. Execute instruction at address A . . . ______________________________________
When instruction 60 is executed, the processor will access the main memory 104 to update the data at address A with instruction X. However, since the prior data at address A has already been loaded into the i-cache 102, this new data (instruction X) will not execute unless the i-cache 102 is updated as well.
When the program does a store of operand data, a sophisticated mechanism is adopted that maintains data consistency to ensure that what was just stored is loaded during a subsequent load of the same data. This technique is generally not applied to the i-cache because the mechanism is complicated and expensive, and since it's fairly rare that a program actually does a store into an instruction stream that is going to be used as an instruction to execute the program.
If an instruction is to be stored at a specific address which is already sitting in the i-cache, then eventually whatever was stored needs to appear as the instruction at that address when executed. Nevertheless, it is preferable not to have in place a complex and expensive mechanism for making sure that instantly the new data becomes visible in the i-cache for every store on every processor in a multiprocessor system, particularly for handling something that occurs so rarely.
Many conventional systems use a so-called "flush instruction" to ensure i-cache consistency. That is, while some processors check each store against the i-cache, most RISC processors do not have a special mechanism for carrying out this check, and instead these processors have special instructions which cause this check to happen. After the program executes a store, it's then required to do a flush instruction at that address before it can know for sure that the instruction will be seen in its instruction stream. The "flush instruction" goes out on the multiprocessor bus as an exclusive request. As such, the software must know that it is storing to the instruction stream, and is required to execute a flush instruction before the hardware absolutely guarantees that this store will be visible.
These machines experience a problem if the process migrates in that the flush instruction needs to propagate to the interconnect because it is unknown if the process migrated between the store and the flush. In a multiprocessor system, all other processors have their i-cache consistent, just not the one doing the storing.
The conventional techniques thus suffer the drawbacks of being complex and expensive, requiring high bandwidth implementation and/or exhibiting migration related problems.